Electronic Lexical Knowledge Base (ELKB)
Update! A newer version of this resource is available at The Open Roget's Project website.
This page presents the Electronic Lexical Knowledge Base ELKB, software for accessing and exploring the Roget's thesaurus. It also provides solutions for various natural language processing tasks.
Keywords: Roget's thesaurus, lexical database, lexical chains, semantic distance.
An Electronic Lexical Knowledge Base (ELKB) is a model for a lexical resource, implemented in software, for classifying, indexing, storing and retrieving words with their senses and the connections that exist between them. It relies on a rich data repository to do so. This model defines explicit semantic relationships between words and word groups. It maps out an automatic process for building an electronic lexicon. It is electronic not only because it is encoded in a digital format, but rather because it is computer-usable, or tractable. This ELKB has been created from the machine readable text files with the contents of the 1987 Penguin's Roget's Thesaurus*. It must maintain the information available in the printed Thesaurus while it is put in a tractable format.
*In this freely available version the 1987 edition was replaced by Roget's Thesaurus from 1911, obtained from the Gutenberg Project.
ContentAll scripts were originally developed as a part of Mario Jarmasz' Master thesis at the University of Ottawa, Canada.
The ELKB (Electronic Lexical Knowledge Base) was created to access the Roget's thesaurus, originally the 1987 Penguin edition, but here the free available version described above.
Practical applications that make use of the Roget's thesaurus are summarized in this package. For example, a program for detecting lexical chains in a document, or scripts for measuring semantic distance between two words by analysing Roget's structure. Here is a detailed description of the package.
Installation and Usage of the ELKB
Note: The java files were compiled with Java 1.5, if you don't
have this java version, you might have to recompile the code:
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
Copyright © 2006
Jarmasz, M. and Szpakowicz, S. (2003a). Roget's Thesaurus and Semantic Similarity. Proceedings of Conference on Recent Advances in Natural Language Processing (RANLP 2003), Borovets, Bulgaria, September, 212-219.
Jarmasz, M. and Szpakowicz, S. (2003b). Not As Easy As It Seems: Automating the Construction of Lexical Chains Using Roget's Thesaurus. Proceedings of the 16th Canadian Conference on Artificial Intelligence (AI 2003), Halifax, Canada, June, 544-549.
Jarmasz, M. and Szpakowicz, S. (2001a). The Design and Implementation of an Electronic Lexical Knowledge Base. Proceeding of the 14th Biennial Conference of the Canadian Society for Computational Studies of Intelligence (AI 2001), Ottawa, Canada, June, 325-333.
Jarmasz, M. and Szpakowicz, S. (2001b). Roget's Thesaurus: a Lexical Resource to Treasure. Proceedings of the NAACL WordNet and Other Lexical Resources workshop. Pittsburgh, June, 186-188.
Contact Olena Medelyan () for more information.