Class PDFXConn


  • public class PDFXConn
    extends Object
    Converting papers in PDF format to XML by means of PDFX (http://pdfx.cs.man.ac.uk/). Also utility methods to compress their images are added.
    • Constructor Summary

      Constructors 
      Constructor Description
      PDFXConn()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static int convertFilesAndStore​(String fileOrDirFullPath, boolean enablePDFcompression, float compressionFactor, boolean recursiveDir, int timeout)
      Convert by means of PDFX a PDF file or recursively all PDF files in a directory (http://pdfx.cs.man.ac.uk/).
      static byte[] pdfCompress​(byte[] inputPDF, float compressionFactor, boolean greyImages)
      Compress the images included in a PDF file in order to reduce the file size.
      static String processPDF​(byte[] inputBytes, int timeout)
      Get a PDF file (max 5Mb) as a byte array and transform it to an XML annotated file by means of the PDFX Web Service (http://pdfx.cs.man.ac.uk/).
      static Map<String,​String> processPDFfile​(String inputFilePath, int timeout)
      Get an PDF file (max 5Mb) by means of its path and transform it to an XML annotated file by means of the PDFX Web Service (http://pdfx.cs.man.ac.uk/).
    • Field Detail

      • useProxy

        public static boolean useProxy
      • proxyScheme

        public static String proxyScheme
      • proxyHost

        public static String proxyHost
      • proxyPort

        public static Integer proxyPort
    • Constructor Detail

      • PDFXConn

        public PDFXConn()
    • Method Detail

      • pdfCompress

        public static byte[] pdfCompress​(byte[] inputPDF,
                                         float compressionFactor,
                                         boolean greyImages)
        Compress the images included in a PDF file in order to reduce the file size. The input and output file names are includes full file paths.
        Parameters:
        inputPDF -
        compressionFactor -
        greyImages -
        Returns:
      • processPDFfile

        public static Map<String,​String> processPDFfile​(String inputFilePath,
                                                              int timeout)
        Get an PDF file (max 5Mb) by means of its path and transform it to an XML annotated file by means of the PDFX Web Service (http://pdfx.cs.man.ac.uk/).
        Parameters:
        inputFilePath -
        timeout - set the socket timeout in milliseconds
        Returns:
      • processPDF

        public static String processPDF​(byte[] inputBytes,
                                        int timeout)
        Get a PDF file (max 5Mb) as a byte array and transform it to an XML annotated file by means of the PDFX Web Service (http://pdfx.cs.man.ac.uk/).
        Parameters:
        inputBytes - input byte array
        timeout - set the socket timeout in milliseconds
        Returns:
      • convertFilesAndStore

        public static int convertFilesAndStore​(String fileOrDirFullPath,
                                               boolean enablePDFcompression,
                                               float compressionFactor,
                                               boolean recursiveDir,
                                               int timeout)
        Convert by means of PDFX a PDF file or recursively all PDF files in a directory (http://pdfx.cs.man.ac.uk/). Compression of PDF files images can be activated, specifying also a compression factor.
        Parameters:
        fileOrDirFullPath -
        enablePDFcompression -
        compressionFactor -
        recursiveDir -
        Returns:
        The number of PDF files correctly converted and stored