Guidelines
Definition of the task (for one instance):
Given a sentence containing a complex word, systems should return an ordered list of “simpler” valid substitutes
for the complex word in its original context. The list of simpler words (up to a maximum of 10) returned by the
system should be ordered by the confidence the system has in its prediction (best predictions first). The ordered
list must not contain ties.
An instance of the task for the English language is:
sentence |
That prompted the military to deploy its largest warship, the BRP Gregorio del Pilar, which was
recently acquired from the United States. |
complex word |
deploy |
For this instance a system may suggest the following ranked substitutes: send, move, position, redeploy, employ,
situate…
Systems should only produce simplifications that are good contextual fits (semantically and syntactically).
Participating teams can register (details below) for three different tracks, one per language:
-
English monolingual (EN)
- Portuguese (Brazilian) monolingual (PT-BR)
- Spanish monolingual (ES)
It is possible to participate in one, two or all three tracks. Participating teams will be allowed to submit
up to 3 runs per track.
Evaluation Metrics
The evaluation metrics used in the TSAR-2022 Shared Task are the following:
- MAP@K (Mean Average Precision @ K): K={1,3,5,10}. The MAP@K metric is used commonly for evaluating
Information Retrieval models and Recommender Systems. For this Lexical Simplification task, instead of using a
ranked list of relevant and irrelevant documents to evaluate our ranking output, we use a ranked list of
predicted substitutes, which can be matched (relevant) and not matched (irrelevant) terms against the set of
the gold-standard annotations for evaluation. The traditional Precision metric, in the context of Lexical
Simplification, can be used to see how many of the predicted substitutes are relevant. But precision fails to
capture the order in which correctly predicted substitutes are. Mean Average Precision is designed to work for
binary relevance: candidates that match or not in the list of gold annotations. So MAP@K for Lexical
Simplification evaluates the following aspects: 1) are the predicted substitutes relevant?, and 2) are the
predicted substitutes at the top positions.
- Potential@K: K={1,3,5,10}. The percentage of instances for which at least one of the substitutions
predicted is present in the set of gold annotations.
- Accuracy@K@top1: K={1,2,3}. The ratio of instances where at least one of the K top predicted
candidates matches the most frequently suggested synonym/s in the gold list of annotated candidates.
Note 1: Potential@1,Precision@1 and MAP@1 will have the same value.
Note 2: The exact computation of the metrics will be provided in the official evaluation script.
Shared Task Paper Submission
Participating teams will be invited to submit system description papers (four pages with unlimited number of pages
for references) which will be peer-reviewed by at least 2 reviewers (at least one member of each participating
team will be required to help with the review process) and papers will be published in the TSAR-2022 Workshop
proceedings. The submissions will be via SoftConf at this site
https://softconf.com/emnlp2022/tsar-st/. Paper submissions
must use the official
EMNLP
templates, which are available as an Overleaf template and also downloadable directly (Latex and Word) (see
here). More details for submission will be
communicated to registered teams in due time.
References
- Matthew Shardlow. A Survey of Automated Text
Simplification. International Journal of Advanced Computer Science and Applications(IJACSA), Special
Issue on Natural Language Processing 2014
- Horacio Saggion. Automatic Text
Simplification. Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers 2017
- Gustavo H. Paetzold, Lucia Specia. A Survey on Lexical
Simplification. J. Artif. Intell. Res. 60: 549-593 (2017)
- Alarcó, R., Moreno, L., and Martínez, P. (2021a). Exploration of Spanish Word Embeddings for Lexical
Simplification. In Proceedings of the First Workshop on Current Trends in Text Simplification CTTS 2021),
volume 2944 of CEUR Workshop Proceedings. CEUR-WS.org,
September.
- Alarcón, R., Moreno, L., and Martínez, P. (2021b). Lexcal Simplification System to Improve Web
Accessibility .IEEE Access, 9:58755–58767, April.
- Oliver Alonzo, Matthew Seita, Abraham Glasser, and Matt Huenerfauth. 2020. Automatic Text Simplification
Tools for Deaf and Hard of Hearing Adults: Benefits of Lexical Simplification and Providing Users with
Autonomy. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI '20).
Association for Computing Machinery, New York, NY, USA, 1–13.
- Sandra M. Aluísio, Lucia Specia, Thiago Alexandre Salgueiro Pardo, Erick Galani Maziero, Renata Pontin de
Mattos Fortes. Towards Brazilian Portuguese automatic text simplification systems. ACM Symposium on Document
Engineering 2008: 240-248
- Daniel Ferrés, Montserrat Marimon, Horacio Saggion, Ahmed AbuRa'ed. YATS: Yet Another Text Simplifier. NLDB
2016: 335-342
- Daniel Ferrés and Horacio Saggio. ALEXSIS: A Dataset for Lexical Simplification in Spanish. Proceedings of
the Language Resources and Evaluation Conference (LREC) 2022.
- Goran Glavas, Sanja Štajner. Simplifying Lexical Simplification: Do We Need Simplified Corpora? ACL (2)
2015: 63-68
- Hartmann, N. S. and Aluísio S. M. (2020). Adaptação Lexical Automática em Textos Informativos do Português
Brasileiro para o Ensino Funtal. Linguamática, ISSN 1647-0818, Vol. 12, Nº. 2, 2020, págs. 3-27
- Maddela, M. and Xu, W. (2018). A Word-Complexity Lexicon and A Neural Readability Ranking Model for Lexical
Simplification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing,
pages 3749–3760
- Gustavo H. Paetzold, Lucia Specia. Unsupervised Lexical Simplification for Non-Native Speakers. AAAI 2016:
3761-3767
- Jipeng Qiang, Yun Li, Yi Zhu, Yunhao Yuan, Xindong Wu. LSBert: A Simple Framework for Lexical
Simplification. arXiv preprint arXiv:2006.14939. 2020.
- Sanja Štajner. 2021. Automatic text simplification for social good: Progress and challenges. In Findings of
the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2637–2652.
- Luz Rello, Ricardo Baeza-Yates, Stefan Bott, Horacio Saggion. Simplify or help?: text simplification
strategies for people with dyslexia. W4A 2013: 15:1-15:10
- Horacio Saggion, Sanja Štajner, Stefan Bott, Simon Mille, Luz Rello, Biljana Drndarevic. Making It
Simplext: Implementation and Evaluation of a Text Simplification System for Spanish. ACM Trans. Access.
Comput. 6(4): 14:1-14:36 (2015)
- Advaith Siddharthan. A survey of research on text simplification. Recent Advances in Automatic Readability
Assessment and Text Simplification. ITL - International Journal of Applied Linguistics, 165:2. 2014