|
|
Tagging algorithms April 10, 2002 |
Some methods
Rule-based: ENGTWOL.
1. Look up words; return (list of) possible tags and appropriate syntactic features for each word.
2. Apply a large number (1100) of hand-coded constraints to eliminate some of the tags.
3. Other procedures (including probabilities) eliminate remaining tags.
Stochastic: HMM (Hidden Markov Model)
Find the most probable sequence of tags, given both tag history and lexical identity of each word. Success: 96%
Chooses a best tag sequence for a sentence, rather than an individual
word.
The method shown involves two steps:
· Pick the most likely tag given the (one or two) previous tag(s)
· Look at the lexical word & see if it's likely given the tag
Transformation-based: Brill tagging
Uses probabilities based on an analysis of a tagged corpus (like
a stochastic tagger), to generate rules (like a rule-based tagger)
1. Generate "transformational" rules from corpus: given a tag sequence
(up to 3 words before or after current word), change the tag
2. Tag the corpus according to lexical probability
3. Apply transformations cyclically until some level of satisfaction is reached
Unknown words: