SUMMA NEs Statistics

Functionality

Adds to each token (or specified annotation) in the document the following statistics:

Number of times the token appears in the document, sentence, and paragraph (if they are present in the document)
The token inverted document frequency
The token term frequency*inverted document frequency

Parameters of the Resource

annSet: the annotation set where the annotations live
annType: the name of the annotations you want the statistics for (e.g. Token)
featureName: the feature you want to use for computing the statistics (e.g. string of the Token)
kindF: a feature in your annotation to restrict the types of annotations to consider (e.g. you may want consider only words and not numbers for computing statistics. For example kind of the Token)
kindV: the value to restrict the computation of statistics (e.g. word for the kind of the Token)
parAnn: the annotation representing the paragraph
sentAnn: the annotation representing the sentence
sentStat: the name (prefix) of the feature for the sentence statistics
paraStat: the name (prefix) of the featutre for the paragraph statistics
tokenStat: the name (prefix) of the feature for the document statitsics (if this feature is 'token' then 'token', 'token_idf', and 'token_tf_idf' will be created with appropriate values in them)
table: a SUMMA IDF table that must be loaded before running this component

Restriction

The document should have the annotations and features needed for it to correctly work. The table of statistics that you use needs to be computed from similar annotations to those you want your statistics computed, i.e. if you want to compute "Token" statistics, then yout IDF table should be one with Token statistics in it.

SUMMA

Overview

SUMMA NEs Statistics

Functionality

Parameters of the Resource

Restriction