Dr Inventor Text Mining Framework


The Dr Inventor Scientific Text Mining library is a fully implemented software to perform deep analysis of scientific publications by implementing advanced Natural Language Processing and Machine Learning technology to extract and summarize the contents of full scientific papers. In particular, our software can perform the following: (i) transformation of PDF content into XML-compliant representations, (ii) identification of the structure and content of scientific documents (titles, abstract, authors, affiliations, sections, subsections, figures, tables, bibliography, etc.), (iii) the identification of citations and references and enrichment of bibliographic entries with semantic information such as authors, publication title, journal, etc., (iv) rhetorical annotation of sentences with respect their role in the scientific argumentation (background, challenge, outcome, hypothesis, etc.), (v) identification and extraction of all kinds of scientific entities and interlinking them by means of coreference chains, (vi) polarity computation for citations in order to understand for example praise, criticism, or neutral references to previous work; (vii) identification of causality information within and across sentence boundaries; (viii) extractive summarization of the contents of one or several documents; (ix) open information extraction from scientific documents; etc.