The package applications contains several scripts for exploring the ELKB, Electronic Lexical Knowledge Base representing Roget's thesaurus (free version, 1911).
Before using the applications, make sure that ELKB is installed on your computer. The folder Resources contains files that can be used as input for each of these applications.
This program analyzes words and phrases that appear in the document and how are they related to each other according to Roget's thesaurus. It builds lexical chains, i.e. sequences of related words, that reflect the topics of this document.
Homogeneity Index HIndex was implemented by O.Medelyan according
to functions proposed in Barzilay
& Elhadad (1997). The score of each lexical chain is:
SemDist measures the semantic distance between two words or phrases, on a scale from 4 (not similar) to 16 (very similar). There are two versions of the program:
1. SemDist - requires an input file, where words or phrases must be supplied in comma separated pairs on one line. An example of an input file is MillerCharles.txt. Output examples for different Thesaurus versions: 1987 or 1911 can also be found in the folder Resources.
where <input file> is a file with words pairs as described above, e.g. Resources/MillerCharles.txt
2. SemDist2Words takes two words as an input and computes their semantic similarity
where <word1> and
<word2> are two valid
words or phrases, e.g. "painter" and "artist". When
entering a phrase consisting of more than 1 word, take it into apostrophes.
WordCluster measures the semantic distance between all combinations of words and phrases in a list. It also clusters them according to their membership in Roget's Heads. A sample input file is radioactive_materials.txt and output files 1987 and 1911, in the folder Resources.
WordPower answers Reader's Digest WordPower type questions:
Contact Olena Medelyan () for more information.