com.gmail.thejcwk.semantics.phrases
Class PhraseExtractor

java.lang.Object
  extended by com.gmail.thejcwk.semantics.phrases.PhraseExtractor

public class PhraseExtractor
extends java.lang.Object

Extracts phrases from an XML file with a specified structure. All phrases are extracted at once and are stored in an ArrayList.

Version:
0.1
Author:
Jan Kroeze

Nested Class Summary
 class PhraseExtractor.PropertyLevel
          Describes at what level in the XML file we find various properties of the Phrase class.
 
Field Summary
 java.lang.String EMPTY_PHRASE_MARKER
          The symbol that denotes an empty phrase.
 int MAX_PHRASES_PER_CLAUSE
          The maximum phrases that are found organised under a clause.
 java.lang.String PHRASE_TAG_PREFIX
          The string that precedes a number from 1..MAX_PHRASES_PER_CLAUSE to identify a phrase.
 
Constructor Summary
PhraseExtractor(java.io.File file)
          Construct a new PhraseExtractor.
 
Method Summary
 java.util.ArrayList<Phrase> getPhrases()
          Parse the given file and return the extracted Phrases.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

EMPTY_PHRASE_MARKER

public final java.lang.String EMPTY_PHRASE_MARKER
The symbol that denotes an empty phrase.

See Also:
Constant Field Values

MAX_PHRASES_PER_CLAUSE

public final int MAX_PHRASES_PER_CLAUSE
The maximum phrases that are found organised under a clause.

See Also:
Constant Field Values

PHRASE_TAG_PREFIX

public final java.lang.String PHRASE_TAG_PREFIX
The string that precedes a number from 1..MAX_PHRASES_PER_CLAUSE to identify a phrase.

See Also:
Constant Field Values
Constructor Detail

PhraseExtractor

public PhraseExtractor(java.io.File file)
                throws java.lang.IllegalArgumentException
Construct a new PhraseExtractor.

Parameters:
file - The name of an XML file that contains the phrases to be extracted. (Precondition: fileName must be a file name denoting a valid XML file.)
Throws:
java.lang.IllegalArgumentException - If an invalid file name is given.
Method Detail

getPhrases

public java.util.ArrayList<Phrase> getPhrases()
                                       throws javax.xml.parsers.ParserConfigurationException,
                                              org.xml.sax.SAXException,
                                              java.io.IOException
Parse the given file and return the extracted Phrases.

Returns:
An object of type ArrayList
Throws:
javax.xml.parsers.ParserConfigurationException
org.xml.sax.SAXException
java.io.IOException