Class PDFextConn


  • public class PDFextConn
    extends Object
    Converting papers in PDF format to XML by means of PDFext (http://pdfext.taln.upf.edu/). EXPERIMENTAL!
    • Field Detail

      • useProxy

        public static boolean useProxy
      • proxyScheme

        public static String proxyScheme
      • proxyHost

        public static String proxyHost
      • proxyPort

        public static Integer proxyPort
    • Constructor Detail

      • PDFextConn

        public PDFextConn()
    • Method Detail

      • processPDFfile

        public static Map<String,​String> processPDFfile​(String inputFilePath,
                                                              int timeout)
        Get an PDF file (max 5Mb) by means of its path and transform it to an XML annotated file by means of the PDFX Web Service (http://pdfx.cs.man.ac.uk/).
        Parameters:
        inputFilePath -
        timeout - set the socket timeout in milliseconds
        Returns:
      • processPDF

        public static String processPDF​(byte[] inputBytes,
                                        String tags,
                                        int timeout,
                                        String fileName)
        Get a PDF file as a byte array and transform it to an XML annotated file by means of the PDFext Web Service (http://pdfext.taln.upf.edu/).
        Parameters:
        inputBytes - input byte array
        timeout - set the socket timeout in milliseconds
        Returns:
      • convertFilesAndStore

        public static int convertFilesAndStore​(String fileOrDirFullPath,
                                               String tags,
                                               boolean recursiveDir,
                                               int timeout)
        Convert by means of PDFext a PDF file or recursively all PDF files in a directory.
        Parameters:
        fileOrDirFullPath -
        tags -
        recursiveDir -
        timeout -
        Returns:
      • main

        public static void main​(String[] args)