Keywords and keyphrases (multi-word units) are widely used in large
document collections. They describe the content of single documents and
provide a kind of semantic metadata that is useful for a wide variety
of purposes. The task of assigning keyphrases to a document is called
keyphrase indexing. For example, academic papers are often
accompanied by a set of keyphrases freely chosen by the author.
In libraries professional indexers select keyphrases from a controlled
vocabulary (also called Subject Headings) according to defined
cataloguing rules. On the Internet, digital libraries, or any depositories
of data (flickr, del.icio.us, blog articles etc.) also use keyphrases
(or here called content tags or content labels) to organize
and provide a thematic access to their data.
KEA is an algorithm for extracting keyphrases from text documents.
It can be either used for free indexing or for indexing with
a controlled vocabulary.
KEA is implemented in Java and is platform independent. It is an open-source
software distributed under the GNU
General Public License.
life, the Kea is one of New Zealand's native parrots, famed for theft,
destroying cars and cameras, forming street gangs, pecking sheep to death
for their delicious kidney fat, and other cutesy antics.
Thanks to Gordon
Paynter, who has create the original version
of this site.
Libraries and Machine Learning Labs
Computer Science Department
The University of Waikato
Private Bag 3105
Hamilton, New Zealand
Free keyphrase indexing:
Eibe Frank (eibe |at| cs.waikato
Controlled keyphrase indexing:
Olena Medelyan (olena |at|