Tagging algorithms

April 10, 2002

Some methods

Rule-based: ENGTWOL.

1. Look up words; return (list of) possible tags and appropriate syntactic features for each word.

2. Apply a large number (1100) of hand-coded constraints to eliminate some of the tags.

3. Other procedures (including probabilities) eliminate remaining tags.

Stochastic: HMM (Hidden Markov Model)

Find the most probable sequence of tags, given both tag history and lexical identity of each word. Success: 96%

Chooses a best tag sequence for a sentence, rather than an individual word.
The method shown involves two steps:
· Pick the most likely tag given the (one or two) previous tag(s)
· Look at the lexical word & see if it's likely given the tag

Transformation-based: Brill tagging

Uses probabilities based on an analysis of a tagged corpus (like a stochastic tagger), to generate rules (like a rule-based tagger)
1. Generate "transformational" rules from corpus: given a tag sequence (up to 3 words before or after current word), change the tag
2. Tag the corpus according to lexical probability
3. Apply transformations cyclically until some level of satisfaction is reached

Unknown words: