Package edu.upf.taln.dri.lib.postproc
Class SpanishParser
- java.lang.Object
-
- edu.upf.taln.dri.lib.postproc.SpanishParser
-
public class SpanishParser extends Object
EXPERIMENTAL!!!
-
-
Field Summary
Fields Modifier and Type Field Description protected static Map<LangENUM,MateParser>
MateParsersLang_Resource
-
Constructor Summary
Constructors Constructor Description SpanishParser()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static void
main(String[] args)
EXPERIMENTAL! This is main class that useful to parse all the Spanish contents of the papers processed by the Dr.
-
-
-
Field Detail
-
MateParsersLang_Resource
protected static Map<LangENUM,MateParser> MateParsersLang_Resource
-
-
Method Detail
-
main
public static void main(String[] args) throws InternalProcessingException
EXPERIMENTAL! This is main class that useful to parse all the Spanish contents of the papers processed by the Dr. Inventor Text Mining Framework and stored as an XML file.
Suppose you have papers (in PDF or JATS XML format) with multi-lingual contents and in particular papers including both English and Spanish text excerpts.
In order to analyze these texts you should:
1) Enable the Dr. Inventor library to deal with multi-lingual contents by setting to true the MultiLangSupport. This action can be performed by setting to true the related configuration flag of the classModuleConfig
;
2) Analyze the multi-lingual papers by means of the Dr. Inventor library and store the contents as an XML file:Factory.getXMLString()
3) Execute the current main class by passing the following program arguments:
[0]: the full local path of the folder to process; this directory and subdirectories will be visited and all XML files processed by means fo the Spanish parser
[1]: the full local path of the DRI resource folder
[2]: OPTIONAL - string to select files to parse. If not empty only files starting with this string will be parsed
All .xml files in the full local path of the folder to process and its subfolder will be processed by means of the Spanish parser and results will be written in an XML files with the same name of the original one but ending in '_ESpars.xml'.- Parameters:
args
-- Throws:
InternalProcessingException
-
-