SUMMA IDF Tables
Loads from file an IDF table previouly computed. The table will stay in memory for you to use.
Parameters of the Resource
- encoding: the encoding of the table
- tableLocation: the location on disk of the table. Under directory resources of summa_plugin we provide aquaint.idf a table for English, and spanish_IDFs.lst a table for Spanish. You can check the format of the tables by editing them in any text editor. The first line is the number of documents which were used to compute IDF values, the other entries contrain a word and the number of documents containing the word.
Restriction
None.SUMMA Corpus IDF Table
Functionality
Computes IDFs on the fly for a processes corpus. The table will stay in memory for you to use.
Parameters of the Resource
- corpus: the corpus to use for creating the table.
- inputAnnotationSet: the annotation set containing the tokens to compute the statistics
- inputAnnotationType: the token you want the statistics for
- featureName: the feature of the token for the statistics
- normalised: a boolean indicating if the word should be lowercased to compute the statistics
- tableLocation: where you want to store your table
- createTable: a boolean indicating if the table should be dumped to disk for future use.
- encoding: the encoding of the table
Restriction
Your corpus should contain the expected annotations a and features. The path to the table should be valid.