Insight Horizon
entertainment /

LIWCDictionary

LIWCDictionary

recognizer
Class LIWCDictionary

java.lang.Object extended by recognizer.LIWCDictionary

public class LIWCDictionary
extends java.lang.Object

Interface to the LIWC dictionary, implementing patterns for each LIWC category based on the LIWC.CAT file (not included).

Version:
1.01
Author:
Francois Mairesse,

Constructor Summary
LIWCDictionary(java.io.File catFile)
          Loads dictionary from LIWC dictionary tab-delimited text file (with variable names as first row).
 
Method Summary
 java.util.Map<java.lang.String,java.lang.Double>getCounts(java.lang.String text, boolean absoluteCounts)
          Returns a map associating each LIWC categories to the number of their occurences in the input text.
static java.lang.String[]splitSentences(java.lang.String text)
          Splits a text into sentences separated by a dot, exclamation point or question mark.
static java.lang.String[]tokenize(java.lang.String text)
          Splits a text into words separated by non-word characters.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Constructor Detail

LIWCDictionary

public LIWCDictionary(java.io.File catFile)
Loads dictionary from LIWC dictionary tab-delimited text file (with variable names as first row). Each word category is converted into a regular expression that is a disjunction of all its members.
Parameters:
catFile - dictionary file, it should be pointing to the LIWC.CAT file of the Linguistic Inquiry and Word Count software (Pennebaker & Francis, 2001).
Method Detail

getCounts

public java.util.Map<java.lang.String,java.lang.Double> getCounts(java.lang.String text, boolean absoluteCounts)
Returns a map associating each LIWC categories to the number of their occurences in the input text. The counts are computed matching patterns loaded. It doesn't produce punctuation counts.
Parameters:
text - input text.
absoluteCounts - includes counts that aren't relative to the total word count (e.g. actual word count).
Returns:
hashtable associating each LIWC category with the percentage of words in the text belonging to it.

splitSentences

public static java.lang.String[] splitSentences(java.lang.String text)
Splits a text into sentences separated by a dot, exclamation point or question mark.
Parameters:
text - text to tokenize.
Returns:
an array of sentences.

tokenize

public static java.lang.String[] tokenize(java.lang.String text)
Splits a text into words separated by non-word characters.
Parameters:
text - text to tokenize.
Returns:
an array of words.