Tutorial @ COLING 2018 (20th August 2018)

Sanja Štajner and Horacio Saggion


Go top

In this tutorial, we aim to provide an extensive overview of automatic text simplification systems proposed so far, the methods they used and discuss the strengths and shortcomings of each of them, providing direct comparison of their outputs. We aim to break some common misconceptions about what text simplification is and what it is not, and how much it has in common with text summarisation and machine translation. We believe that deeper understanding of initial motivations, and an in-depth analysis of existing TS methods would help researchers new to ATS propose even better systems, bringing fresh ideas from other related NLP areas. We will describe and explain all the most influential methods used for automatic simplification of texts so far, with the emphasis on their strengths and weaknesses noticed in a direct comparison of systems outputs. We will present all the existing resources for TS for various languages, including parallel manually produced TS corpora, comparable automatically aligned TS corpora, paraphrase- and synonym- resources, TS-specific sentence-alignment tools, and several TS evaluation resources. Finally, we will discuss the existing evaluation methodologies for TS, and necessary conditions for using each of them.


Go top

  • Motivation for ATS:
    • Problems for various NLP tools and applications
    • Reading difficulties of various target populations
  • TS projects:
    • Short description of TS projects (PSET, Simplext, PorSimples, FIRST, SIMPATICO)
    • Discussion about the TS projects (what they share and in what they differ)
  • TS resources:
    • Resources for lexical simplification
    • Resources for lexico-syntactic simplification
    • Resources for languages other than English
  • Evaluation of TS systems:
    • Automatic evaluation
    • Human evaluation
  • Comparison of non-neural TS approaches:
    • Rule-based systems
    • Data-driven systems (supervised and unsupervised)
    • Hybrid systems
    • Semantically-motivated ATS systems
  • Neural text simplification (NTS) systems:
    • State-of-the-art neural text simplification (NTS) systems
    • Direct comparison of NTS systems
    • Strengths and weaknesses of NTS systems
  • NTS systems vs. previously proposed (non-neural) ATS systems (direct comparison)
  • Current challenges in ATS
Tutorial Slides Download (current version)


Go top
Sanja Stajner Sanja Štajner is currently a postdoctoral research fellow at the University of Mannheim, Germany. She holds a multiple Masters degree in Natural Language Processing and Human Language Technologies (Autonomous University of Barcelona, Spain and University of Wolverhampton, UK) and PhD degree in Computer Science from the University of Wolverhampton on the topic of “Data-driven Text Simplification”. She participated in Simplext and FIRST projects on automatic text simplification, and is the lead author of four ACL papers on text simplification (including the first neural text simplification system) and numerous other papers on the topics of text simplification and readability assessment at various leading international conferences and journals. Sanja’s interests in text simplification include building tools for automatic sentence alignment, building ATS systems using various approaches (machine translation, neural machine translation, event-detection, unsupervised lexical simplification), complex word identification (from eye-tracking data, and crowdsourced data), and evaluation of text simplification systems. Sanja regularly teaches NLP at Masters and PhD levels, delivers invited talks and seminars at various universities and companies, and had a very successful tutorial on “Deep Learning for Text Simplification” at RANLP 2017. She is an area chair for COLING 2018, and regular program committee member of ACL, EMNLP, LREC, IJCAI, IAAA and other international conferences and journals. She was a lead organizer of the first international workshop and shared task on Quality Assessment of Text Simplification (QATS) in 2016, and is a lead organizer of Complex Word Identification shared task in 2018.
Horacio Saggion Horacio Saggion holds a PhD in Computer Science from Universite de Montreal, Canada. He obtained his BSc in Computer Science from Universidad de Buenos Aires in Argentina, and his MSc in Computer Science from UNICAMP in Brazil. Horacio is an Associate Professor at the Department of Information and Communication Technologies, Universitat Pompeu Fabra (UPF), Barcelona. He is a member of the Natural Language Processing group where he works on automatic text summarization, text simplification, information extraction, sentiment analysis and related topics. His research is empirical combining symbolic, pattern-based approaches and statistical and machine learning techniques. Before joining Universitat Pompeu Fabra, he worked at the University of Sheffield for a number of UK and European research projects (SOCIS, MUMIS, MUSING, GATE, CUBREPORTER) developing competitive human language technology. He was also an invited researcher at John Hopkins University for a project on multilingual text summarization. He is currently principal investigator for UPF in several EU and national projects. Horacio has published over 100 works in leading scientific journals, conferences, and books in the field of human language technology. He organized four international workshops in the areas of text summarization and information extraction and was co-chair of STIL 2009 and program chair of SEPLN 2014. He is co-editor of a book on multilingual, multisource information extraction and summarization published by Springer in 2013 and author of the book Automatic Text Simplfication (Morgan & Claypool Publishers, 2017). Horacio is member of the ACL, IEEE, ACM, and SADIO. He is a regular programme committee member for international conferences such as ACL, EACL, COLING, EMNLP, IJCNLP, IJCAI and is an active reviewer for international journals in computer science, information processing, and human language technology. Horacio has given courses, tutorials, and invited talks at a number of international events including LREC, ESSLLI, IJCNLP, NLDB, and RuSSIR.


Go top
Here there is a list of some core references related to this tutorial: