![]() |
ELKB ApplicationsThe package applications contains several scripts for exploring the ELKB, Electronic Lexical Knowledge Base representing Roget's thesaurus (free version, 1911). CONTENT
All scripts were originally developed as a part of Mario
Jarmasz' Master thesis at the University
of Ottawa, Canada. Before using the applications, make sure that ELKB is installed on your computer. The folder Resources contains files that can be used as input for each of these applications. Lexical ChainsThis program analyzes words and phrases that appear in the document and how are they related to each other according to Roget's thesaurus. It builds lexical chains, i.e. sequences of related words, that reflect the topics of this document. Usage where HIndex Homogeneity Index HIndex was implemented by O.Medelyan according
to functions proposed in Barzilay
& Elhadad (1997). The score of each lexical chain is: Semantic DistanceSemDist measures the semantic distance between two words or phrases, on a scale from 4 (not similar) to 16 (very similar). There are two versions of the program: 1. SemDist - requires an input file, where words or phrases must be supplied in comma separated pairs on one line. An example of an input file is MillerCharles.txt. Output examples for different Thesaurus versions: 1987 or 1911 can also be found in the folder Resources. Usage: where <input file> is a file with words pairs as described above, e.g. Resources/MillerCharles.txt 2. SemDist2Words takes two words as an input and computes their semantic similarity Usage: where <word1> and
<word2> are two valid
words or phrases, e.g. "painter" and "artist". When
entering a phrase consisting of more than 1 word, take it into apostrophes. Word ClustersWordCluster measures the semantic distance between all combinations of words and phrases in a list. It also clusters them according to their membership in Roget's Heads. A sample input file is radioactive_materials.txt and output files 1987 and 1911, in the folder Resources. Usage: Word Power Game WordPower answers Reader's Digest WordPower type questions:
Usage: A sample input file is Resources/rd_july2000.txt. You will find answers detected with two versions of Roget's: 1987 and 1911. ContactContact Olena Medelyan (
|