SUMMA Centroid Sentence Similarity
Functionality
Adds to each sentence a feature ('centroid_sim') representing the similarity of the sentence to the centroid of the set of documents.
Parameters of the Resource
- annSet: the annotation set where the annotations live
- sentAnn: the name of the annotation for which you want to compute the feature (e.g. Sentence)
- corpus: the corpus with the documents
- sentVec: the name of the annotation with the sentence vector
- centroid: the vector with the centroid (a feature of the corpus)
Restriction
This resource should be used in a GATE pipeline, it does not make sense to use it in a Corpus Pipeline! A centroid must exist as a feature of the corpus. Sentence vectors must exist in the documents.