All Known Implementing Classes:

DocumentImpl
```
public interface Document
```
Interface to access a Document processed by Dr Inventor.

To get an instance of a Document by the Document interface, you have always to use one of the Factory methods:
- Factory.createNewDocument()
- Factory.createNewDocument(String absoluteFilePath)
- Factory.createNewDocument(File file)

Method Summary

All Methods Instance Methods Abstract Methods
Modifier and Type	Method	Description
`void`	`cleanUp()`	Call this method only WHEN YOU ARE SURE YOU WILL NOT USE THE DOCUMENT NO MORE IN YOUR DATA.
`List<Citation>`	`extractCitations()`	Get the list of citations extracted from the document.
`DependencyGraph`	`extractDocumentGraph(SentenceSelectorENUM sentenceSel)`	Get the graph representing a portion of a document.
`Header`	`extractHeader()`	Extract the information retrieved by parsing the header of the paper
`List<Section>`	`extractSections(Boolean onlyRoot)`	Get the sections (or a subset of the sections) of the document
`Sentence`	`extractSentenceById(int sentenceId)`	Get one of the sentences of the document by id
`DependencyGraph`	`extractSentenceGraph(int sentenceId, SentGraphTypeENUM graphType)`	Get the graph representing a sentence.
`List<Sentence>`	`extractSentences(SentenceSelectorENUM sentenceSel)`	Load the list of sentences of the document, ordered by their occurrence in the document.
`List<Sentence>`	`extractSummary(int sentNumber, SummaryTypeENUM summaryType)`	Generate a summary of the paper by selecting a relevant set of sentences.
`List<CandidateTermOcc>`	`extractTerminology()`	Load the list of terms extracted from the document.
`String`	`getName()`	Get the name of the document
`String`	`getRawText()`	Get the raw text of the document (UTF-8 encoded)
`SourceENUM`	`getSourceDocumentType()`	Get the original document type from which the `Document` instance has been created.
`Document`	`getXMLDocument()`	Get the contents of the document as an instance of org.w3c.dom.Document
`String`	`getXMLString()`	Get the XML string-serialized contents of the document, as a string (UTF-8 char encoding)
`boolean`	`isCleanUp()`	Check if the document data structures has been cleaned by calling the cleanUp() method.
`void`	`loadXML(File file)`	Load the XML string-serialized contents of the document (UTF-8) from a file
`void`	`loadXML(String absoluteFilePath)`	Load the XML string-serialized contents of the document (UTF-8) from a file, by specifying the file's absolute path
`void`	`loadXMLString(String XMLStringContents)`	Load the XML string-serialized contents of the document from a string (UTF-8 char encoding)
`void`	`preprocess()`	Pre-compute the text analysis of the document in order to speed-up the execution of the extract-methods.
`void`	`resetDocumentExtractionData()`	This method deletes all the data extracted from the original document including sentences, terminology, citations, etc.

- Method Detail
  - getName
```
String getName()
        throws InternalProcessingException
```
    Get the name of the document
    
    Returns:
    
    Throws:
    
    InternalProcessingException
  - getXMLString
```
String getXMLString()
             throws InternalProcessingException
```
    Get the XML string-serialized contents of the document, as a string (UTF-8 char encoding)
    
    Returns:
    
    the String representing the contents of the Document
    
    Throws:
    
    InternalProcessingException
  - getXMLDocument
```
Document getXMLDocument()
                 throws InternalProcessingException
```
    Get the contents of the document as an instance of org.w3c.dom.Document
    
    Returns:
    
    the document as an instance of the class org.w3c.dom.Document
    
    Throws:
    
    InternalProcessingException
  - loadXML
```
void loadXML(String absoluteFilePath)
      throws DRIexception
```
    Load the XML string-serialized contents of the document (UTF-8) from a file, by specifying the file's absolute path
    
    Parameters:
    
    absoluteFilePath - the absolute path of the file with the XML string-serialized contents of the document to load
    
    Throws:
    
    DRIexception
  - loadXML
```
void loadXML(File file)
      throws DRIexception
```
    Load the XML string-serialized contents of the document (UTF-8) from a file
    
    Parameters:
    
    file - the file with the XML string-serialized contents of the document to load
    
    Throws:
    
    DRIexception
  - loadXMLString
```
void loadXMLString(String XMLStringContents)
            throws DRIexception
```
    Load the XML string-serialized contents of the document from a string (UTF-8 char encoding)
    
    Parameters:
    
    XMLStringContents - the String with the XML serialized contents to load
    
    Throws:
    
    InternalProcessingException
    
    DRIexception
  - getRawText
```
String getRawText()
           throws InternalProcessingException
```
    Get the raw text of the document (UTF-8 encoded)
    
    Returns:
    
    the UTF-8 encoded text of the document
    
    Throws:
    
    InternalProcessingException
  - preprocess
```
void preprocess()
         throws InternalProcessingException
```
    Pre-compute the text analysis of the document in order to speed-up the execution of the extract-methods.
    
    Throws:
    
    InternalProcessingException
  - extractHeader
```
Header extractHeader()
              throws InternalProcessingException
```
    Extract the information retrieved by parsing the header of the paper
    
    Returns:
    
    Throws:
    
    InternalProcessingException
  - extractSections
```
List<Section> extractSections(Boolean onlyRoot)
                       throws InternalProcessingException
```
    Get the sections (or a subset of the sections) of the document
    
    Parameters:
    
    onlyRoot - if equal to true, extract only the top level sections (h1)
    
    Returns:
    
    Throws:
    
    InternalProcessingException
  - extractSentences
```
List<Sentence> extractSentences(SentenceSelectorENUM sentenceSel)
                         throws InternalProcessingException
```
    Load the list of sentences of the document, ordered by their occurrence in the document. If sentences have not been extracted, the first time this method is executed the document text is split into sentences.
    
    Parameters:
    
    sentenceSel - the type of sentence to select
    
    Returns:
    
    the set of sentences in document order
    
    Throws:
    
    InternalProcessingException
  - extractSentenceById
```
Sentence extractSentenceById(int sentenceId)
                      throws InternalProcessingException
```
    Get one of the sentences of the document by id
    
    Parameters:
    
    sentenceId -
    
    Returns:
    
    null if the sentence id is null or not a valid id
    
    Throws:
    
    InternalProcessingException
  - extractTerminology
```
List<CandidateTermOcc> extractTerminology()
                                   throws DRIexception
```
    Load the list of terms extracted from the document. If the terminology has not been extracted from the document, the first time this method is executed relevant terms are extracted from the document.
    
    Returns:
    
    the set of sentences in document order
    
    Throws:
    
    DRIexception
  - extractSummary
```
List<Sentence> extractSummary(int sentNumber,
                              SummaryTypeENUM summaryType)
                       throws InternalProcessingException
```
    Generate a summary of the paper by selecting a relevant set of sentences. Sentences are ordered by their relevance in descending order.
    
    Parameters:
    
    sentNumber - from 1 to 30
    
    summaryType -
    
    Returns:
    
    Throws:
    
    InternalProcessingException
  - extractSentenceGraph
```
DependencyGraph extractSentenceGraph(int sentenceId,
                                     SentGraphTypeENUM graphType)
                              throws DRIexception
```
    Get the graph representing a sentence. The id of the sentence can be retrieved by the method extractSentences() NB: experimental sentence graphs merging approach implemented
    
    Parameters:
    
    sentenceId -
    
    graphType -
    
    Returns:
    
    Throws:
    
    DRIexception
  - extractDocumentGraph
```
DependencyGraph extractDocumentGraph(SentenceSelectorENUM sentenceSel)
                              throws DRIexception
```
    Get the graph representing a portion of a document. The nodes of the graph are merged by relying on co-reference chains.
    
    Parameters:
    
    sentenceSel -
    
    Returns:
    
    Throws:
    
    DRIexception
  - extractCitations
```
List<Citation> extractCitations()
                         throws InternalProcessingException
```
    Get the list of citations extracted from the document.
    
    Returns:
    
    Throws:
    
    InternalProcessingException
  - resetDocumentExtractionData
```
void resetDocumentExtractionData()
                          throws InternalProcessingException
```
    This method deletes all the data extracted from the original document including sentences, terminology, citations, etc. After calling this method on a Document object, the next time sentences, terminology, citations, etc. from the document are accessed, they are extracted again and not read from the output of a previous extraction process execution.
    
    Throws:
    
    InternalProcessingException
  - getSourceDocumentType
```
SourceENUM getSourceDocumentType()
                          throws InternalProcessingException
```
    Get the original document type from which the Document instance has been created. The set of document types are the values of SourceENUM.
    
    Returns:
    
    Throws:
    
    InternalProcessingException
  - cleanUp
```
void cleanUp()
      throws InternalProcessingException
```
    Call this method only WHEN YOU ARE SURE YOU WILL NOT USE THE DOCUMENT NO MORE IN YOUR DATA. This method will clean all the document data structures made the memory occupied by these data ready for garbage collection. Note that, if you try to access / call methods of the document after calling this method an exception will be raised to state that the resource has been already closed and its data cleaned.
    
    Throws:
    
    InternalProcessingException
  - isCleanUp
```
boolean isCleanUp()
           throws InternalProcessingException
```
    Check if the document data structures has been cleaned by calling the cleanUp() method. A cleaned up document cannot be used no more; if you try to access / call methods of the document after calling this method an Exception will be raised to state that the resource has been already closed and its data cleaned.
    
    Returns:
    
    true if the document data structures has been cleaned.
    
    Throws:
    
    InternalProcessingException

Interface Document

Method Summary

Method Detail

getName

getXMLString

getXMLDocument

loadXML

loadXML

loadXMLString

getRawText

preprocess

extractHeader

extractSections

extractSentences

extractSentenceById

extractTerminology

extractSummary

extractSentenceGraph

extractDocumentGraph

extractCitations

resetDocumentExtractionData

getSourceDocumentType

cleanUp

isCleanUp