java.lang.Object
- edu.upf.taln.dri.module.summary.util.similarity.TFIDFVectorWiki

```
public class TFIDFVectorWiki
extends Object
```
Utility class to compute TD-IDF vectors from textual excerpts.

Field Summary

Fields
Modifier and Type	Field	Description
`boolean`	`appendPOS`
`boolean`	`getLemma`
`boolean`	`onlyWordKind`
`boolean`	`removeStopWords`
`Set<String>`	`stopWordsList`
`boolean`	`toLowerCase`

Constructor Summary

Constructors
Constructor Description

TFIDFVectorWiki(SimLangENUM langIN)

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`Map<String,Double>`	`computeTFIDFvect(gate.Annotation ann, gate.Document doc)`
`double`	`cosSimTFIDF(gate.Annotation ann1, gate.Document doc1, gate.Annotation ann2, gate.Document doc2)`	Compute the TF IDF similarity among the two document annotations Term frequency of a sentence: number of times the token appears in the sentence Inverse document frequency: logarithm of the total number of documents divided by the number of docs in which the token appears
`double`	`cosSimTFIDF(Map<String,Double> tokenDoc1, Map<String,Double> tokenDoc2)`	Compute the TF IDF similarity among the two token lists Term frequency of a sentence: number of times the token appears in the sentence Inverse document frequency: logarithm of the total number of documents divided by the number of docs in which the token appears
`static List<String>`	`extractTokenList(gate.Annotation ann, gate.Document doc, TokenFilterInterface tokenFilter, boolean onlyWordKind, boolean getLemma, boolean toLowerCase, boolean appendPOS, boolean removeStopWords, Set<String> stopWordsList)`	Given an annotation of a TDDocument, extract the list of tokens (eventually repeated in case of multiple occurrences)

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

onlyWordKind
```
public boolean onlyWordKind
```

getLemma
```
public boolean getLemma
```

toLowerCase
```
public boolean toLowerCase
```

removeStopWords
```
public boolean removeStopWords
```

appendPOS
```
public boolean appendPOS
```

stopWordsList
```
public Set<String> stopWordsList
```

Constructor Detail

TFIDFVectorWiki

public TFIDFVectorWiki(SimLangENUM langIN)
                throws InvalidParameterException

Throws:: InvalidParameterException

Method Detail

computeTFIDFvect

public Map<String,Double> computeTFIDFvect(gate.Annotation ann,
                                                 gate.Document doc)

cosSimTFIDF
```
public double cosSimTFIDF(Map<String,Double> tokenDoc1,
                          Map<String,Double> tokenDoc2)
```
Compute the TF IDF similarity among the two token lists Term frequency of a sentence: number of times the token appears in the sentence Inverse document frequency: logarithm of the total number of documents divided by the number of docs in which the token appears

Parameters:

tokenSent1 -

tokenSetn2 -

Returns:

cosSimTFIDF
```
public double cosSimTFIDF(gate.Annotation ann1,
                          gate.Document doc1,
                          gate.Annotation ann2,
                          gate.Document doc2)
```
Compute the TF IDF similarity among the two document annotations Term frequency of a sentence: number of times the token appears in the sentence Inverse document frequency: logarithm of the total number of documents divided by the number of docs in which the token appears

Parameters:

ann1 -

doc1 -

ann2 -

doc2 -

Returns:

extractTokenList

public static List<String> extractTokenList(gate.Annotation ann,
                                            gate.Document doc,
                                            TokenFilterInterface tokenFilter,
                                            boolean onlyWordKind,
                                            boolean getLemma,
                                            boolean toLowerCase,
                                            boolean appendPOS,
                                            boolean removeStopWords,
                                            Set<String> stopWordsList)

Given an annotation of a TDDocument, extract the list of tokens (eventually repeated in case of multiple occurrences)

Parameters:: ann -; doc -; onlyWordKind -; getLemma -; toLowerCase -; removeStopWords -
Returns:

Class TFIDFVectorWiki

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

onlyWordKind

getLemma

toLowerCase

removeStopWords

appendPOS

stopWordsList

Constructor Detail

TFIDFVectorWiki

Method Detail

computeTFIDFvect

cosSimTFIDF

cosSimTFIDF

extractTokenList