Class Vocabulary

java.lang.Object
  extended by Vocabulary
All Implemented Interfaces:
java.io.Serializable

public class Vocabulary
extends java.lang.Object
implements java.io.Serializable

Builds an index with the content of the controlled vocabulary. Accepts vocabularies as rdf files (SKOS format) and in plain text format: vocabulary_name.en (with "ID TERM" per line) - descriptors & non-descriptors vocabulary_name.use (with "ID_NON-DESCR \t ID_DESCRIPTOR" per line) vocabulary_name.rel (with "ID \t RELATED_ID1 RELATED_ID2 ... " per line) See KEA's homepage for more details.

See Also:
Serialized Form

Field Summary
static java.io.File EN
          Location of the vocabulary's *.en file containing all terms of the vocabularies and their ids.
static java.io.File REL
          Location of the vocabulary's *.rel file containing semantically related terms for each descriptor in the vocabulary.
static java.io.File SKOS
          Location of the rdf version of the controlled vocabulary it needs to be in the SKOS format!
static java.io.File USE
          Location of the vocabulary's *.use file containing ids of non-descriptor with the corresponding ids of descriptors.
 
Constructor Summary
Vocabulary(java.lang.String vocabularyName, java.lang.String vocabularyFormat)
          Vocabulary constructor.
 
Method Summary
 void build()
          Builds the vocabulary index from the text files.
 void buildREL()
          Builds the vocabulary index with semantically related terms.
 void buildSKOS()
          Builds the vocabulary indexes from SKOS file.
 void buildUSE()
          Builds the vocabulary index with descriptors/non-descriptors relations.
 boolean containsEntry(java.lang.String phrase)
          Checks whether a normalized version of a phrase (pseudo phrase) is a valid vocabulary term.
 java.lang.String getDescriptor(java.lang.String id)
          Given id of the non-descriptor returs the id of the corresponding descriptor
 java.lang.String getID(java.lang.String phrase)
          Given a phrase returns its id in the vocabulary.
 java.lang.String getOrig(java.lang.String id)
          Given id, gets the original version of vocabulary term.
 java.util.Vector getRelated(java.lang.String id)
          Given id of a term returns the list with ids of terms related to this term.
 void initialize()
          Starts initialization of the vocabulary.
 java.lang.String pseudoPhrase(java.lang.String str)
          Generates the preudo phrase from a string.
 void setStemmer(Stemmer newStemmer)
          Set the Stemmer value.
 void setStopwords(Stopwords newM_Stopwords)
          Set the M_Stopwords value.
static java.lang.String[] sort(java.lang.String[] a)
          Sorts an array of Strings into alphabetic order
 java.lang.String[] split(java.lang.String str, java.lang.String separator)
          Splits a string str at given character sequence (separator) into an array.
static void swap(int loc1, int loc2, java.lang.String[] a)
          overloaded swap method: exchange 2 locations in an array of Strings.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SKOS

public static java.io.File SKOS
Location of the rdf version of the controlled vocabulary it needs to be in the SKOS format!


EN

public static java.io.File EN
Location of the vocabulary's *.en file containing all terms of the vocabularies and their ids.


USE

public static java.io.File USE
Location of the vocabulary's *.use file containing ids of non-descriptor with the corresponding ids of descriptors.


REL

public static java.io.File REL
Location of the vocabulary's *.rel file containing semantically related terms for each descriptor in the vocabulary.

Constructor Detail

Vocabulary

public Vocabulary(java.lang.String vocabularyName,
                  java.lang.String vocabularyFormat)
Vocabulary constructor. Given the name of the vocabulary and the format it first checks whether the VOCABULARIES directory contains the specified files: - vocabularyName.rdf if skos format is selected - or a set of 3 flat files starting with vocabularyName and with extensions .en (id term) .use (non-descriptor \t descriptor) .rel (id \t related_id1 related_id2 ...) If the required files exist, the vocabulary index is built.

Parameters:
vocabularyName - The name of the vocabulary file (before extension).
vocabularyFormat - The format of the vocabulary (skos or text).
Method Detail

initialize

public void initialize()
Starts initialization of the vocabulary.


setStemmer

public void setStemmer(Stemmer newStemmer)
Set the Stemmer value.

Parameters:
newStemmer - The new Stemmer value.

setStopwords

public void setStopwords(Stopwords newM_Stopwords)
Set the M_Stopwords value.

Parameters:
newM_Stopwords - The new M_Stopwords value.

buildSKOS

public void buildSKOS()
               throws java.lang.Exception
Builds the vocabulary indexes from SKOS file.

Throws:
java.lang.Exception

build

public void build()
           throws java.lang.Exception
Builds the vocabulary index from the text files.

Throws:
java.lang.Exception

buildUSE

public void buildUSE()
              throws java.lang.Exception
Builds the vocabulary index with descriptors/non-descriptors relations.

Throws:
java.lang.Exception

buildREL

public void buildREL()
              throws java.lang.Exception
Builds the vocabulary index with semantically related terms.

Throws:
java.lang.Exception

containsEntry

public boolean containsEntry(java.lang.String phrase)
Checks whether a normalized version of a phrase (pseudo phrase) is a valid vocabulary term.

Parameters:
phrase -
Returns:
true if phrase is in the vocabulary

getID

public java.lang.String getID(java.lang.String phrase)
Given a phrase returns its id in the vocabulary.

Parameters:
phrase -
Returns:
id of the phrase in the vocabulary index

getOrig

public java.lang.String getOrig(java.lang.String id)
Given id, gets the original version of vocabulary term.

Parameters:
id -
Returns:
original version of the vocabulary term

getDescriptor

public java.lang.String getDescriptor(java.lang.String id)
Given id of the non-descriptor returs the id of the corresponding descriptor

Parameters:
id - of the non-descriptor
Returns:
id of the descriptor

getRelated

public java.util.Vector getRelated(java.lang.String id)
Given id of a term returns the list with ids of terms related to this term.

Parameters:
id -
Returns:
a vector with ids related to the input id

split

public java.lang.String[] split(java.lang.String str,
                                java.lang.String separator)
Splits a string str at given character sequence (separator) into an array.

Parameters:
str, - separator
Returns:
String array with string parts separated by the separator string

pseudoPhrase

public java.lang.String pseudoPhrase(java.lang.String str)
Generates the preudo phrase from a string. A pseudo phrase is a version of a phrase that only contains non-stopwords, which are stemmed and sorted into alphabetical order.


swap

public static void swap(int loc1,
                        int loc2,
                        java.lang.String[] a)
overloaded swap method: exchange 2 locations in an array of Strings.


sort

public static java.lang.String[] sort(java.lang.String[] a)
Sorts an array of Strings into alphabetic order