Class PDFPaperParser


  • public class PDFPaperParser
    extends Object
    Collection of utilities to mine data from PDF documents
    • Constructor Detail

      • PDFPaperParser

        public PDFPaperParser()
    • Method Detail

      • getHeaderSentences

        public static List<String> getHeaderSentences​(InputStream PDFInputStream,
                                                      boolean onlyHeaderParagraph)
        Extract from the PDF, a list of paragraph by converting the PDF file to HTML
        Parameters:
        PDFInputStream -
        onlyHeaderParagraph -
        Returns:
      • getHeaderSentences

        public static List<String> getHeaderSentences​(URL PDF_URL,
                                                      boolean onlyHeaderParagraph)
        Extract from the PDF, a list of paragraph by converting the PDF file to HTML
        Parameters:
        hrefStr -
        onlyHeaderParagraph - set to true to extract only header paragraph
        Returns: