Class KEAModelBuilder

java.lang.Object
  extended by KEAModelBuilder
All Implemented Interfaces:
OptionHandler

public class KEAModelBuilder
extends java.lang.Object
implements OptionHandler

Builds a keyphrase extraction model from the documents in a given directory. Assumes that the file names for the documents end with ".txt". Assumes that files containing corresponding author-assigned keyphrases end with ".key". Optionally an encoding for the documents/keyphrases can be defined (e.g. for Chinese text). Valid options are:

-l "directory name"
Specifies name of directory.

-m "model name"
Specifies name of model.

-e "encoding"
Specifies encoding.

-d
Turns debugging mode on.

-k
Use keyphrase frequency statistic.

-r
Use agrovoc relation as feature.

-p
Disallow internal periods.

-x "length"
Sets maximum phrase length (default: 3).

-y "length"
Sets minimum phrase length (default: 1).

-o "number"
The minimum number of times a phrase needs to occur (default: 2).

-s "name of class implementing list of stop words"
Sets list of stop words to used (default: StopwordsEnglish).

-t "name of class implementing stemmer"
Sets stemmer to use (default: IteratedLovinsStemmer).

-n
Do not check for proper nouns.

Version:
1.0
Author:
Eibe Frank ([email protected])

Constructor Summary
KEAModelBuilder()
           
 
Method Summary
 void buildModel(java.util.Hashtable stems)
          Builds the model from the files
 java.util.Hashtable collectStems()
          Collects the stems of the file names.
 boolean getCheckForProperNouns()
          Get the M_CheckProperNouns value.
 boolean getDebug()
          Get the value of debug.
 java.lang.String getDirName()
          Get the value of dirName.
 boolean getDisallowIPeriods()
          Get the value of disallowIPeriods.
 java.lang.String getEncoding()
          Get the value of encoding.
 int getMaxPhraseLength()
          Get the value of MaxPhraseLength.
 int getMinNumOccur()
          Get the value of MinNumOccur.
 int getMinPhraseLength()
          Get the value of MinPhraseLength.
 java.lang.String getModelName()
          Get the value of modelName.
 java.lang.String[] getOptions()
          Gets the current option settings.
 Stemmer getStemmer()
          Get the Stemmer value.
 Stopwords getStopwords()
          Get the M_Stopwords value.
 boolean getUseKFrequency()
          Get the value of useKFrequency.
 java.lang.String getVocabulary()
          Get the value of vocabulary name.
 java.lang.String getVocabularyFormat()
          Get the value of vocabulary format.
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] ops)
          The main method.
 void saveModel()
          Saves the extraction model to the file.
 void setCheckForProperNouns(boolean newM_CheckProperNouns)
          Set the M_CheckProperNouns value.
 void setDebug(boolean newdebug)
          Set the value of debug.
 void setDirName(java.lang.String newdirName)
          Set the value of dirName.
 void setDisallowIPeriods(boolean newdisallowIPeriods)
          Set the value of disallowIPeriods.
 void setEncoding(java.lang.String newencoding)
          Set the value of encoding.
 void setMaxPhraseLength(int newMaxPhraseLength)
          Set the value of MaxPhraseLength.
 void setMinNumOccur(int newMinNumOccur)
          Set the value of MinNumOccur.
 void setMinPhraseLength(int newMinPhraseLength)
          Set the value of MinPhraseLength.
 void setModelName(java.lang.String newmodelName)
          Set the value of modelName.
 void setOptions(java.lang.String[] options)
          Parses a given list of options controlling the behaviour of this object.
 void setStemmer(Stemmer newStemmer)
          Set the Stemmer value.
 void setStopwords(Stopwords newM_Stopwords)
          Set the M_Stopwords value.
 void setUseKFrequency(boolean newuseKFrequency)
          Set the value of useKFrequency.
 void setVocabulary(java.lang.String newvocabulary)
          Set the value of vocabulary name.
 void setVocabularyFormat(java.lang.String newvocabularyFormat)
          Set the value of vocabulary format.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

KEAModelBuilder

public KEAModelBuilder()
Method Detail

getCheckForProperNouns

public boolean getCheckForProperNouns()
Get the M_CheckProperNouns value.

Returns:
the M_CheckProperNouns value.

setCheckForProperNouns

public void setCheckForProperNouns(boolean newM_CheckProperNouns)
Set the M_CheckProperNouns value.

Parameters:
newM_CheckProperNouns - The new M_CheckProperNouns value.

getStopwords

public Stopwords getStopwords()
Get the M_Stopwords value.

Returns:
the M_Stopwords value.

setStopwords

public void setStopwords(Stopwords newM_Stopwords)
Set the M_Stopwords value.

Parameters:
newM_Stopwords - The new M_Stopwords value.

getStemmer

public Stemmer getStemmer()
Get the Stemmer value.

Returns:
the Stemmer value.

setStemmer

public void setStemmer(Stemmer newStemmer)
Set the Stemmer value.

Parameters:
newStemmer - The new Stemmer value.

getMinNumOccur

public int getMinNumOccur()
Get the value of MinNumOccur.

Returns:
Value of MinNumOccur.

setMinNumOccur

public void setMinNumOccur(int newMinNumOccur)
Set the value of MinNumOccur.

Parameters:
newMinNumOccur - Value to assign to MinNumOccur.

getMaxPhraseLength

public int getMaxPhraseLength()
Get the value of MaxPhraseLength.

Returns:
Value of MaxPhraseLength.

setMaxPhraseLength

public void setMaxPhraseLength(int newMaxPhraseLength)
Set the value of MaxPhraseLength.

Parameters:
newMaxPhraseLength - Value to assign to MaxPhraseLength.

getMinPhraseLength

public int getMinPhraseLength()
Get the value of MinPhraseLength.

Returns:
Value of MinPhraseLength.

setMinPhraseLength

public void setMinPhraseLength(int newMinPhraseLength)
Set the value of MinPhraseLength.

Parameters:
newMinPhraseLength - Value to assign to MinPhraseLength.

getDisallowIPeriods

public boolean getDisallowIPeriods()
Get the value of disallowIPeriods.

Returns:
Value of disallowIPeriods.

setDisallowIPeriods

public void setDisallowIPeriods(boolean newdisallowIPeriods)
Set the value of disallowIPeriods.

Parameters:
newdisallowIPeriods - Value to assign to disallowIPeriods.

getUseKFrequency

public boolean getUseKFrequency()
Get the value of useKFrequency.

Returns:
Value of useKFrequency.

setUseKFrequency

public void setUseKFrequency(boolean newuseKFrequency)
Set the value of useKFrequency.

Parameters:
newuseKFrequency - Value to assign to useKFrequency.

getDebug

public boolean getDebug()
Get the value of debug.

Returns:
Value of debug.

setDebug

public void setDebug(boolean newdebug)
Set the value of debug.

Parameters:
newdebug - Value to assign to debug.

getEncoding

public java.lang.String getEncoding()
Get the value of encoding.

Returns:
Value of encoding.

setEncoding

public void setEncoding(java.lang.String newencoding)
Set the value of encoding.

Parameters:
newencoding - Value to assign to encoding.

getVocabulary

public java.lang.String getVocabulary()
Get the value of vocabulary name.

Returns:
Value of vocabulary name.

setVocabulary

public void setVocabulary(java.lang.String newvocabulary)
Set the value of vocabulary name.

Parameters:
newvocabulary - Value to assign to vocabulary name.

getVocabularyFormat

public java.lang.String getVocabularyFormat()
Get the value of vocabulary format.

Returns:
Value of vocabulary format.

setVocabularyFormat

public void setVocabularyFormat(java.lang.String newvocabularyFormat)
Set the value of vocabulary format.

Parameters:
newvocabularyFormat - Value to assign to vocabulary format.

getModelName

public java.lang.String getModelName()
Get the value of modelName.

Returns:
Value of modelName.

setModelName

public void setModelName(java.lang.String newmodelName)
Set the value of modelName.

Parameters:
newmodelName - Value to assign to modelName.

getDirName

public java.lang.String getDirName()
Get the value of dirName.

Returns:
Value of dirName.

setDirName

public void setDirName(java.lang.String newdirName)
Set the value of dirName.

Parameters:
newdirName - Value to assign to dirName.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options controlling the behaviour of this object. Valid options are:

-l "directory name"
Specifies name of directory.

-m "model name"
Specifies name of model.

-v "vocabulary name"
Specifies vocabulary name.

-f "vocabulary format"
Specifies vocabulary format.

-e "encoding"
Specifies encoding.

-d
Turns debugging mode on.

-k
Use keyphrase frequency statistic.

-p
Disallow internal periods.

-x "length"
Sets maximum phrase length (default: 3).

-y "length"
Sets minimum phrase length (default: 3).

-o "number"
The minimum number of times a phrase needs to occur (default: 2).

-s "name of class implementing list of stop words"
Sets list of stop words to used (default: StopwordsEnglish).

-t "name of class implementing stemmer"
Sets stemmer to use (default: IteratedLovinsStemmer).

-n
Do not check for proper nouns.

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current option settings.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options

collectStems

public java.util.Hashtable collectStems()
                                 throws java.lang.Exception
Collects the stems of the file names.

Throws:
java.lang.Exception

buildModel

public void buildModel(java.util.Hashtable stems)
                throws java.lang.Exception
Builds the model from the files

Throws:
java.lang.Exception

saveModel

public void saveModel()
               throws java.lang.Exception
Saves the extraction model to the file.

Throws:
java.lang.Exception

main

public static void main(java.lang.String[] ops)
The main method.