Class PDFXConn
- java.lang.Object
-
- edu.upf.taln.dri.common.connector.pdfx.PDFXConn
-
public class PDFXConn extends Object
Converting papers in PDF format to XML by means of PDFX (http://pdfx.cs.man.ac.uk/). Also utility methods to compress their images are added.
-
-
Constructor Summary
Constructors Constructor Description PDFXConn()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static int
convertFilesAndStore(String fileOrDirFullPath, boolean enablePDFcompression, float compressionFactor, boolean recursiveDir, int timeout)
Convert by means of PDFX a PDF file or recursively all PDF files in a directory (http://pdfx.cs.man.ac.uk/).static byte[]
pdfCompress(byte[] inputPDF, float compressionFactor, boolean greyImages)
Compress the images included in a PDF file in order to reduce the file size.static String
processPDF(byte[] inputBytes, int timeout)
Get a PDF file (max 5Mb) as a byte array and transform it to an XML annotated file by means of the PDFX Web Service (http://pdfx.cs.man.ac.uk/).static Map<String,String>
processPDFfile(String inputFilePath, int timeout)
Get an PDF file (max 5Mb) by means of its path and transform it to an XML annotated file by means of the PDFX Web Service (http://pdfx.cs.man.ac.uk/).
-
-
-
Method Detail
-
pdfCompress
public static byte[] pdfCompress(byte[] inputPDF, float compressionFactor, boolean greyImages)
Compress the images included in a PDF file in order to reduce the file size. The input and output file names are includes full file paths.- Parameters:
inputPDF
-compressionFactor
-greyImages
-- Returns:
-
processPDFfile
public static Map<String,String> processPDFfile(String inputFilePath, int timeout)
Get an PDF file (max 5Mb) by means of its path and transform it to an XML annotated file by means of the PDFX Web Service (http://pdfx.cs.man.ac.uk/).- Parameters:
inputFilePath
-timeout
- set the socket timeout in milliseconds- Returns:
-
processPDF
public static String processPDF(byte[] inputBytes, int timeout)
Get a PDF file (max 5Mb) as a byte array and transform it to an XML annotated file by means of the PDFX Web Service (http://pdfx.cs.man.ac.uk/).- Parameters:
inputBytes
- input byte arraytimeout
- set the socket timeout in milliseconds- Returns:
-
convertFilesAndStore
public static int convertFilesAndStore(String fileOrDirFullPath, boolean enablePDFcompression, float compressionFactor, boolean recursiveDir, int timeout)
Convert by means of PDFX a PDF file or recursively all PDF files in a directory (http://pdfx.cs.man.ac.uk/). Compression of PDF files images can be activated, specifying also a compression factor.- Parameters:
fileOrDirFullPath
-enablePDFcompression
-compressionFactor
-recursiveDir
-- Returns:
- The number of PDF files correctly converted and stored
-
-