LIWCDictionary
recognizer
Class LIWCDictionary
java.lang.Objectrecognizer.LIWCDictionary
public class LIWCDictionary
- extends java.lang.Object
Interface to the LIWC dictionary, implementing patterns for each LIWC category based on the LIWC.CAT file (not included).
- Version:
- 1.01
- Author:
- Francois Mairesse,
| Constructor Summary | |
|---|---|
LIWCDictionary(java.io.File catFile)Loads dictionary from LIWC dictionary tab-delimited text file (with variable names as first row). | |
| Method Summary | |
|---|---|
java.util.Map<java.lang.String,java.lang.Double> | getCounts(java.lang.String text, boolean absoluteCounts)Returns a map associating each LIWC categories to the number of their occurences in the input text. |
static java.lang.String[] | splitSentences(java.lang.String text)Splits a text into sentences separated by a dot, exclamation point or question mark. |
static java.lang.String[] | tokenize(java.lang.String text)Splits a text into words separated by non-word characters. |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
LIWCDictionary
public LIWCDictionary(java.io.File catFile)
- Loads dictionary from LIWC dictionary tab-delimited text file (with variable names as first row). Each word category is converted into a regular expression that is a disjunction of all its members.
- Parameters:
catFile- dictionary file, it should be pointing to the LIWC.CAT file of the Linguistic Inquiry and Word Count software (Pennebaker & Francis, 2001).
| Method Detail |
|---|
getCounts
public java.util.Map<java.lang.String,java.lang.Double> getCounts(java.lang.String text, boolean absoluteCounts)
- Returns a map associating each LIWC categories to the number of their occurences in the input text. The counts are computed matching patterns loaded. It doesn't produce punctuation counts.
- Parameters:
text- input text.absoluteCounts- includes counts that aren't relative to the total word count (e.g. actual word count).- Returns:
- hashtable associating each LIWC category with the percentage of words in the text belonging to it.
splitSentences
public static java.lang.String[] splitSentences(java.lang.String text)
- Splits a text into sentences separated by a dot, exclamation point or question mark.
- Parameters:
text- text to tokenize.- Returns:
- an array of sentences.
tokenize
public static java.lang.String[] tokenize(java.lang.String text)
- Splits a text into words separated by non-word characters.
- Parameters:
text- text to tokenize.- Returns:
- an array of words.