Class KEAPhraseFilter

java.lang.Object
  extended by weka.filters.Filter
      extended by KEAPhraseFilter
All Implemented Interfaces:
java.io.Serializable, OptionHandler

public class KEAPhraseFilter
extends Filter
implements OptionHandler

This filter splits the text in selected string attributes into phrases. The resulting string attributes contain these phrases separated by '\n' characters. Phrases are identified according to the following definitions: A phrase is a sequence of words interrupted only by sequences of whitespace characters, where each sequence of whitespace characters contains at most one '\n'. A word is a sequence of letters or digits that contains at least one letter, with the following exceptions: a) '.', '@', '_', '&', '/', '-' are allowed if surrounded by letters or digits, b) '\'' is allowed if preceeded by a letter or digit, c) '-', '/' are also allowed if succeeded by whitespace characters followed by another word. In that case the whitespace characters will be deleted.

Version:
1.0
Author:
Eibe Frank ([email protected])
See Also:
Serialized Form

Constructor Summary
KEAPhraseFilter()
           
 
Method Summary
 java.lang.String attributeIndicesTipText()
          Returns the tip text for this property
 boolean batchFinished()
          Signify that this batch of input to the filter is finished.
 java.lang.String disallowInternalPeriodsTipText()
          Returns the tip text for this property
 java.lang.String getAttributeIndices()
          Get the current range selection.
 boolean getDisallowInternalPeriods()
          Get whether the supplied columns are to be processed
 boolean getInvertSelection()
          Get whether the supplied columns are to be processed
 java.lang.String[] getOptions()
          Gets the current settings of the filter.
 java.lang.String globalInfo()
          Returns a string describing this filter
 boolean input(Instance instance)
          Input an instance for filtering.
 java.lang.String invertSelectionTipText()
          Returns the tip text for this property
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options
static void main(java.lang.String[] argv)
          Main method for testing this class.
 void setAttributeIndices(java.lang.String rangeList)
          Set which attributes are to be processed
 void setAttributeIndicesArray(int[] attributes)
          Set which attributes are to be processed
 void setDisallowInternalPeriods(boolean disallow)
          Set whether selected columns should be processed.
 boolean setInputFormat(Instances instanceInfo)
          Sets the format of the input instances.
 void setInvertSelection(boolean invert)
          Set whether selected columns should be processed.
 void setOptions(java.lang.String[] options)
          Parses a given list of options controlling the behaviour of this object.
 
Methods inherited from class weka.filters.Filter
batchFilterFile, filterFile, getOutputFormat, inputFormat, isOutputFormatDefined, numPendingOutput, output, outputFormat, outputPeek, useFilter
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

KEAPhraseFilter

public KEAPhraseFilter()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this filter

Returns:
a description of the filter suitable for displaying in the explorer/experimenter gui

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options

Specified by:
listOptions in interface OptionHandler
Returns:
an enumeration of all the available options

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options controlling the behaviour of this object. Valid options are:

-R index1,index2-index4,...
Specify list of attributes to process. First and last are valid indexes. (default none)

-V
Invert matching sense

-P
Disallow internal periods

Specified by:
setOptions in interface OptionHandler
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the filter.

Specified by:
getOptions in interface OptionHandler
Returns:
an array of strings suitable for passing to setOptions

setInputFormat

public boolean setInputFormat(Instances instanceInfo)
                       throws java.lang.Exception
Sets the format of the input instances.

Overrides:
setInputFormat in class Filter
Parameters:
instanceInfo - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).
Returns:
true if the outputFormat may be collected immediately
Throws:
java.lang.Exception - if the inputFormat can't be set successfully

input

public boolean input(Instance instance)
              throws java.lang.Exception
Input an instance for filtering. Ordinarily the instance is processed and made available for output immediately. Some filters require all instances be read before producing output.

Overrides:
input in class Filter
Parameters:
instance - the input instance
Returns:
true if the filtered instance may now be collected with output().
Throws:
java.lang.Exception - if the input instance was not of the correct format or if there was a problem with the filtering.

batchFinished

public boolean batchFinished()
                      throws java.lang.Exception
Signify that this batch of input to the filter is finished. If the filter requires all instances prior to filtering, output() may now be called to retrieve the filtered instances. Any subsequent instances filtered should be filtered based on setting obtained from the first batch (unless the inputFormat has been re-assigned or new options have been set). This default implementation assumes all instance processing occurs during inputFormat() and input().

Overrides:
batchFinished in class Filter
Returns:
true if there are instances pending output
Throws:
java.lang.NullPointerException - if no input structure has been defined,
java.lang.Exception - if there was a problem finishing the batch.

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - should contain arguments to the filter: use -h for help

invertSelectionTipText

public java.lang.String invertSelectionTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getInvertSelection

public boolean getInvertSelection()
Get whether the supplied columns are to be processed

Returns:
true if the supplied columns won't be processed

setInvertSelection

public void setInvertSelection(boolean invert)
Set whether selected columns should be processed. If true the selected columns won't be processed.

Parameters:
invert - the new invert setting

disallowInternalPeriodsTipText

public java.lang.String disallowInternalPeriodsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getDisallowInternalPeriods

public boolean getDisallowInternalPeriods()
Get whether the supplied columns are to be processed

Returns:
true if the supplied columns won't be processed

setDisallowInternalPeriods

public void setDisallowInternalPeriods(boolean disallow)
Set whether selected columns should be processed. If true the selected columns won't be processed.

Parameters:
disallow - the new invert setting

attributeIndicesTipText

public java.lang.String attributeIndicesTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getAttributeIndices

public java.lang.String getAttributeIndices()
Get the current range selection.

Returns:
a string containing a comma separated list of ranges

setAttributeIndices

public void setAttributeIndices(java.lang.String rangeList)
Set which attributes are to be processed

Parameters:
rangeList - a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1.
eg: first-3,5,6-last

setAttributeIndicesArray

public void setAttributeIndicesArray(int[] attributes)
Set which attributes are to be processed

Parameters:
attributes - an array containing indexes of attributes to select. Since the array will typically come from a program, attributes are indexed from 0.