SUMMA Centroid Sentence Similarity

Functionality

Adds to each sentence a feature ('centroid_sim') representing the similarity of the sentence to the centroid of the set of documents.

Parameters of the Resource

annSet: the annotation set where the annotations live
sentAnn: the name of the annotation for which you want to compute the feature (e.g. Sentence)
corpus: the corpus with the documents
sentVec: the name of the annotation with the sentence vector
centroid: the vector with the centroid (a feature of the corpus)

Restriction

This resource should be used in a GATE pipeline, it does not make sense to use it in a Corpus Pipeline! A centroid must exist as a feature of the corpus. Sentence vectors must exist in the documents.

SUMMA

Overview

SUMMA Centroid Sentence Similarity

Functionality

Parameters of the Resource

Restriction