Third Workshop on Multilingual Surface Realisation
Barcelona, December 12th, 2020
- The Multilingual Surface Realisation workshop series aims to bring together researchers interested in surface-oriented Natural Language Generation problems such as word order determination, inflection, functional word determination, paraphrasing, etc., especially in a multilingual context. The 2020 edition of the workshop, MSR’20, will incorporate presentation of the results of the Surface Realisation Shared Task 2020 and of a number of technical papers on (M)SR topics. The workshop will be held at COLING'20 in Barcelona, on December 12 2020.
- For information related to the conference and workshops organisation and attendance, please refer to the COLING FAQ.
Call for papers
↑
- Natural Language Generation (NLG) is garnering growing interest both as a stand-alone task (e.g. data-to-text and text-to-text generation) and as a component task in embedding applications (e.g., abstractive summarization, dialogue-based interaction, question answering, etc.). Since 2017, three ‘deep’ NLG shared tasks have been organised focusing on English language generation from abstract semantic representations: WebNLG, SemEval Task 9, E2E. In 2018 and 2019, we organised the first shared tasks that focused on multilingual surface realisation (SR'18 and SR'19). The paradigm shift in NLP from traditional supervised machine learning techniques to deep learning, with associated substantial improvements in quality, is beginning to have a particularly transformative effect on NLG.
- In parallel to the boost from deep learning techniques, the last few years have seen a push in the annotation of multilingual treebanks with Universal Dependencies (UD), such that resources for a number of languages are now available: Currently, with version 2.6, 163 treebanks covering 92 languages can be downloaded freely. UD Treebanks facilitate the development of applications that work potentially across all of the UD treebank languages in a uniform fashion. As has already been seen in parsing, these treebanks are also a good basis for multilingual shared tasks: a system that has been built for some of the languages may work, with some adjustments, for the other languages as well. SR’18 and SR’19 were the first steps in the direction of taking advantage of these rich resources for NLG. While the MSR workshops host the SR shared tasks and the presentation of the results of the shared task by the participating teams, but much broader scope, seeking to provide a forum also for other SR research and more general work on the role of UD structures and the linkages they afford to the generation and parsing fields.
- MSR’20 invites contributions on all topics that are related to multilingual and monolingual surface realisation in NLG, specifically including and encouraging reversible methods. We welcome all submissions that address problems of surface-oriented generation such as grammatical and/or information structure-driven word order determination, inflection, functional word determination, paraphrasing, etc. We particularly encourage the submission of papers that make a clear contribution to the progress in robust multilingual surface generation, i.e. present methods easily portable from one language to another and clearly scalable. Topics of interest include, but are not limited to:
- Linearisation in NLG
- Multilingual approaches to surface realisation
- Function word generation
- Inflection in NLG
- Joint generation from abstract representations
- Surface-oriented text simplification
- Surface-oriented spoken language generation
- Application of surface realisation for grammatical error correction
- NLG in surface-oriented paraphrasing
- Deep learning approaches to NLG
Shared Task
↑
- This year's shared task event uses the same training, validation and in-domain test sets as the SR’19 Shared Task, with the same data and resource restrictions (see the SR’20 webpage for details). However, this year’s shared task differs in two respects: (i) the addition of an "open" mode in each track, where there are no restrictions on the resources that can be used, and (ii) the introduction of new evaluation datasets.
- As in previous years, the goal of the shared task is to generate a well-formed sentence given the input structure, and there are two tracks with different levels of complexity:
- Shallow Track (Track 1): This track starts from vanilla UD structures in which word order information has been removed and tokens have been lemmatised, i.e. the inputs are unordered dependency trees with lemmatised nodes that contain PoS tags and morphological information as found in the original annotations. The task is equivalent to determining the word order and inflecting the words. As indicated above, there will be both a closed (T1a) and an open (T1b) subtrack.
- Deep Track (Track 2): This track starts from UD structures from which functional words (in particular, auxiliaries, functional prepositions and conjunctions) and surface-oriented morphological information have been removed. In addition to what has to be done for the Shallow Track, the Deep Track thus involves introducing the removed functional words and morphological features. Again, there will be a closed (T2a) and an open (T2b) subtracks.
- Deep learning approaches to NLG
-
Full details of both tasks can be found on the SR'20 website.
Important Dates
↑-
12 July 2020: Shared Task starts
28 July 2020 : First call for papers
20 August, 2020 : Second call for papers
31 October 2020 : Camera-ready papers due
12 December 2020 : Workshop date
Submissions
↑-
We invite long papers (8 pages) and short papers (4 pages). Both long and short papers have unlimited references, and their final versions will be given one additional page (up to 9 and 5 pages, respectively, in the proceedings and unlimited pages for references).
- MSR 2020 uses a double-blind reviewing process. Papers must conform to the official COLING’20 style guidelines, be in PDF format, and be submitted via the Softconf START conference management system. The paper submission deadline for both long and short workshop papers and for SR’20 system descriptions is 8 October, 2020.
- To encourage inclusiveness and the presentation of speculative and recent work, inclusion in the conference proceedings will be made optional. The author’s preference should be indicated with the final submission.
- Multiple submissions policy: Multiple submissions are allowed, but the authors should indicate clearly whether they have submitted or plan to submit a paper with the same content to another venue. To encourage inclusiveness and the presentation of speculative and recent work, inclusion in the conference proceedings will be made optional. The author’s preference should be indicated with the final submission.
-
Templates, guidelines and other policies: Please refer to the COLING website for new policies for submission, review, and citation, and official style guidelines.
Registration
↑- For registration information, please visit the COLING’20 registration page.
Program
- The workshop will consist of technical presentations, the presentation of the shared task results, an invited talk and a discussion session.
- We are happy to announce that Yue Zhang (Westlake University) will be an invited speaker at the workshop:
14:00 | Opening |
14:15 | Invited Talk: Yue Zhang AMR to text generation -- a brief review and a case study using back-parsing |
15:00 | Oral presentation The Third Multilingual Surface Realisation Shared Task: Overview and Evaluation Results Simon Mille, Anja Belz, Bernd Bohnet, Thiago Castro Ferreira, Yvette Graham, Leo Wanner |
15:30 | Break |
15:50 15:50 16:00 16:10 16:20 16:30 16:40 | Short presentation and Q&A with authors BME-TUW at SR’20: Lexical grammar induction for surface realization Gábor Recski, Ádám Kovács, Kinga Gémes, Judit Ács and Andras Kornai ADAPT at SR’20: How Preprocessing and Data Augmentation Help to Improve Surface Realization Henry Elder IMSurReal Too: IMS in the Surface Realization Shared Task 2020 Xiang Yu, Simon Tannert, Ngoc Thang Vu and Jonas Kuhn Lexical Induction of Morphological and Orthographic Forms for Low-Resourced Languages Taha Tobaili NILC at SR’20: Exploring Pre-Trained Models in Surface Realisation Marco Antonio Sobrevilla Cabezudo and Thiago Pardo Surface Realization Using Pretrained Language Models Farhood Farahnak, Laya Rafiee, Leila Kosseim and Thomas Fevens |
16:50 | Break | 17:00 | Panel/Discussions |
18:00 | Closing |
Proceedings
↑- You can download the proceedings from the ACL Anthology and see the details of the task results and participating systems:
Programme Committee
↑-
Miguel Ballesteros, Amazon-AWS AI, USA
Valerio Basile, Torino University, Italy
Alberto Bugarn, University of Santiago de Compostela, Spain
Wenchao Du, Carnegie Mellon University, USA
William Dyer, Oracle Corporation, USA
Henry Elder, ADAPT Center DCU, Ireland
Farhood Farahnak, Concordia University, Canada
Kim Gerdes, Sorbonne Nouvelle, France
Xudong Hong, MPI Informatics and Saarland University, Germany
Yannis Konstas, Heriot Watt University, UK
Emiel Krahmer, Tilburg University, The Netherlands
Guy Lapalme, RALI, Universit de Montral, Canada
Elena Lloret, Universitat d’Alacant, Spain
Alessandro Mazzei, Torino University, Italy
Ryan McDonald, Google Research, USA
David McDonald, Sift Inc., USA
Joakim Nivre, Uppsala University, Sweden
Laura Perez, Pompeu Fabra University, Spain
Yevgeniy Puzikov, TU Darmstadt, Germany
Gabor Recksi, TU Wien, Austria
Leonardo Ribeiro, TU Darmstadt, Germany
Horacio Saggion, Pompeu Fabra University, Spain
Anastasia Shimorina, LORIA, France
Aleksander Shvets, Pompeu Fabra University, Spain
Xiang Yu, University of Stuttgart, Germany
Yue Zhang, Westlake University, China
Contact
- Please send us an email at msr.organizers@gmail.com if you have any question.
Organising committee
↑Simon Mille | TALN
Pompeu Fabra University, Barcelona, Spain |
Anya Belz | University of Brighton UK |
Bernd Bohnet | Google Research, London, UK |
Thiago Castro Ferreira | Federal University of Minas Gerais, Brasil |
Yvette Graham | ADAPT Center, Dublin City University, Ireland |
Leo Wanner | TALN
Pompeu Fabra University and ICREA, Barcelona, Spain |
Funding
- (1) Science Foundation Ireland (sfi.ie) under the SFI Research Centres Programme co-funded under the European Regional Development Fund, grant number 13/RC/2106 (ADAPT Centre for Digital Content Technology, www.adaptcentre.ie) at Dublin City University;
(2) the Applied Data Analytics Research & Enterprise Group, University of Brighton, UK; and
(3) the European Commission under the H2020 via contracts to UPF, with the numbers 825079-STARTS (MindSpaces), 786731-RIA (CONNEXIONs), 779962-RIA (V4Design).
Photo by Christopher Burns on Unsplash