 |

Kea is distributed under the GNU
General Public License. The current version 5.0 allows free as well
as controlled indexing. It uses the latest version of the Weka
machine learning workbench.
KEA-5.0
- easy to install and use, direct from your code or from the command
line
- free or controlled indexing, with
any vocabulary in text or SKOS
format
- latest libraries, including Jena-2.4
and Weka-3.5.5
- easily applicable to new languages and domains
- distributed with sample vocabularies in 3 languages (en, es, fr)
- contains sample documents in 3 languages for creating and testing
models
Download:
Download Kea from its Google Code project page. It includes source code, required libraries, test data and documentation.
Also consider using Maui, an algorithm for topic indexing, which can be used for the same tasks as Kea, but offers additional features. Maui also allows indexing using Wikipedia as a controlled vocabulary.
Examples of controlled vocabularies that can be used with Kea (and Maui)
Documentation
|
Free or Controlled
Indexing?
In free indexing, keyphrases are significant terms that appear in the
document. Any document in the phrase is a potential keyphrase. The advantage
of free indexing is that it can be applied to any document. The disadvantages
are poor quality of extracted phrases (compared to controlled indexing)
and the indexing is not consistent.
In controlled indexing, keyphrases are chosen from a controlled vocabulary
(a dictionary, thesaurus, or a list of terms). It has the advantage that
all documents are indexed in a consistent way disregarding their wording.
For example, two documents, one about "laptops" and another
one about "notebooks", would be indexed with the same term,
which is the preferred term in the controlled vocabulary to describe this
concept.
Older Versions
Kea-4.1 (ZIP, 6.6 MB) -- controlled
indexing only
Kea-4.0
(ZIP, 1 MB) -- controlled indexing for agricultural documents
only.
Kea-3.0
(ZIP, 512 KB) -- free indexing only.
- It is based on the original version, which has been re-implemented
in Java. Version 3.0 additionally allows indexing German documents.
Implementing further languages is straightforward.
The oldest version of Kea is still available for download. It
is implemented in Perl and Java (and a little C) for Unix systems. It
is not straightforward to install it; you will probably have to know a
little about Perl and Java. We strongly recommend you read the README
file before you attempt it, so that you know what's in store.
Here is a model for the old version of Kea that was trained on a collection
of Computer Science Technical Reports and uses domain-specific keyphrase
frequency information for better results.
Other Resources
KEA has also been integrated into the NLP workbench GATE (http://gate.ac.uk).
Please send queries regarding the KEA plugin for GATE to the GATE support
mailing list (http://gate.ac.uk/mail/index.html).
There is a IKMV version of KEA 3.0 (for dotnet/C#) developed by Enrico
Lu. It is available on his website: http://enricolu.myweb.hinet.net/.
History
- Version 5.0 - Kea that combines controlled and free indexing. Works
with the latest version of Weka,
- Version 4.1 - Kea now works with any controlled vocabulary in SKOS
format.
- Version 4.0 - Kea for agricultural documents
- Version 3.0 - Kea now also works for German documents
- Version 2.0 - Kea is now fully Java-based
- Version 1.1.4 - finally updated Kea-1.1.4-README.txt
to cover building models, and added a count-lines.pl
script to this end.
- Version 1.1.3 - Moved Lynx command to script that checks for conditions
that are likely to crash it.
- Version 1.1.2 - Documentation, phrase length set at command-line.
- Version 1.1.1 - Set output extension at command-line
|