The Dr Inventor Text Mining Library (DRI) integrates in a single software platform a collection of scientific text analysis modules useful to automatically extract a varied range of structural, linguistic and semantic features from the textual content of scientific publications. The text analysis modules of the DRI library have been developed from scratch or adapted from existing text mining software and tailored to the scientific documents. Each module is responsible for the analysis of a particular aspect of the knowledge encoded in a scientific publication. Most of the times the processing results of a module are represented by means of textual annotations. Such annotations are in turn exploited by other modules to analyze further facets of scientific articles.
The DRI library constitutes the text mining tool to extract and model knowledge from scientific publications. The results of the scientific text mining performed by the DRI library enables a wide range of scientific literature analyses and data aggregations aimed. Among its features, the DRI library supports the representation of excerpts of scientific papers by means of Subject-Verb-Object graphs
The main features of the DRI library are:
- PDF to XML transformation,
- word and sentence identification,
- the identification citations and references and enrichment of bibliographic entries with semantic information such as authors, publication title, journal, etc.
- full dependency parsing,
- rhetorical annotation of sentences with respect their role in the scientific argumentation (background, challenge, outcome, hypothesis, etc.),
- identification and extraction of all kinds of scientific entities and interlinking them by means of coreference chains,
- polarity computation for citations in order to understand for example praise, criticism, or neutral references to previous work,
- identification of causality information within and across sentence boundaries,
- summary generation,
- open information extraction from scientific documents.