Class NgramsC

  • All Implemented Interfaces:
    edu.upf.taln.ml.feat.base.FeatCalculator<String,​gate.Annotation,​DocumentCtx>

    public class NgramsC
    extends Object
    implements edu.upf.taln.ml.feat.base.FeatCalculator<String,​gate.Annotation,​DocumentCtx>
    Generate skipgrams from the text or lemmatized text of a sentence
    • Constructor Summary

      Constructors 
      Constructor Description
      NgramsC​(String tokenAnnotationSet, String tokenAnnotationName, String featureName, String featureFilterName, String featureFilterValue, boolean featureFilterStartsWith, Integer ngramFactor, boolean removeStopWords, boolean excludeCitSpan)
      Generate a list of ngram taking into account the document ordered sequence of tokens and getting the token text as the value of a specific feature (like lemma or string) The three arguments that have a name starting with fratureFilter are devoted to filter by a feature name and value the annotations to consider.
    • Constructor Detail

      • NgramsC

        public NgramsC​(String tokenAnnotationSet,
                       String tokenAnnotationName,
                       String featureName,
                       String featureFilterName,
                       String featureFilterValue,
                       boolean featureFilterStartsWith,
                       Integer ngramFactor,
                       boolean removeStopWords,
                       boolean excludeCitSpan)
        Generate a list of ngram taking into account the document ordered sequence of tokens and getting the token text as the value of a specific feature (like lemma or string) The three arguments that have a name starting with fratureFilter are devoted to filter by a feature name and value the annotations to consider. If featureFilterName is null or empty no filter is performed. If featureFilterName is not null or empty, only annotations having a feature with that name are considered. If also featureFilterValue is not null or empty, only annotations having a feature name equal to featureFilterName and a feature value equal to featureFilterValue are considered. If featureFilterStartsWith is true, it is only checked that the feature value starts with the featureFilterValue string. If excludeCitSpan is true, the token included in an inline citation span are not considered.
        Parameters:
        tokenAnnotationSet -
        tokenAnnotationName -
        featureName -
        featureFilterName -
        featureFilterValue -
        featureFilterStartsWith -
        ngramFactor -
        removeStopWords -
        excludeCitSpan -
    • Method Detail

      • calculateFeature

        public edu.upf.taln.ml.feat.base.MyString calculateFeature​(gate.Annotation obj,
                                                                   DocumentCtx doc,
                                                                   String featName)
        Specified by:
        calculateFeature in interface edu.upf.taln.ml.feat.base.FeatCalculator<String,​gate.Annotation,​DocumentCtx>