This submit discusses my highlights of ACL 2021, together with challenges in benchmarking, machine translation, mannequin understanding, and multilingual NLP.
ACL 2021 occurred just about from 1–6 August 2021. Listed here are my highlights from the convention:
NLP benchmarking is damaged
Many talks and papers made reference to the present state of NLP benchmarking, which has seen present benchmarks largely outpaced by quickly bettering pre-trained fashions.
My favorite sources on this subject from the convention are:
I’ve additionally written a longer weblog submit that gives a broader overview of various views, challenges, and potential options to enhance benchmarking in NLP.
NLP is all about pre-trained Transformers
This could come as no shock however it’s nonetheless attention-grabbing to see that among the many 14 “scorching” matters of 2021 (see under) have been 5 pre-trained fashions (BERT, RoBERTa, BART, GPT-2, XLM-R) and one common “Language fashions” subject. These fashions are basically all variants of the identical Transformer structure.
This serves as a helpful reminder that the group is overfitting to a specific setting and that it might be worthwhile to look past the usual Transformer mannequin (see my latest publication for some inspiration).
There have been a couple of papers that sought to enhance the overall Transformer structure for processing lengthy and quick paperwork respectively:
Machine translation, just like previous years, was one of the fashionable tracks of the convention, simply behind the overall ML monitor when it comes to the variety of submissions as could be seen under.
3 of the highest 6 papers are on MT:
- Scientific Credibility of Machine Translation Analysis: A Meta-Analysis of 769 Papers. Marie et al. examine how credible the analysis in numerous papers truly is. They discover that the majority papers used BLEU; 74.3% solely used BLEU. 108 new MT metrics have been proposed within the final decade however none are used constantly. Unsurprisingly, most papers don’t carry out statistical significance testing. An growing variety of papers copy scores from earlier work. Typically scores are reported utilizing completely different variants of BLEU script and are subsequently not comparable. They supply the next pointers for MT analysis: Don’t use BLEU solely, do statistical significance testing, don’t copy numbers from prior work, examine techniques utilizing the identical pre-processed information. Extra not too long ago, in 2022, Benjamin Marie has written further posts the place he analysed shortcomings within the MT analysis of distinguished papers.
- Neural Machine Translation with Monolingual Translation Reminiscence. Cai et al. mix neural networks with a non-parametric reminiscence.
- Vocabulary Studying by way of Optimum Transport for Neural Machine Translation. Xu et al. body vocabulary studying as optimum transport. They suggest to make use of marginal utility as a measure for a superb vocabulary.
Inside machine translation, there have been a few papers that I notably loved:
There have been additionally some papers that centered on machine translation for low-resource language varieties with out utilizing parallel information for these languages:
Gaining a greater understanding of the behaviour of present fashions was one other theme of the convention, with three out of the six excellent papers falling on this space:
- Intrinsic Dimensionality Explains the Effectiveness of Language Mannequin Superb-Tuning. This paper analyses fine-tuning by means of the lens of intrinsic dimension and reveals that frequent pre-trained fashions have a really low intrinsic dimension. In addition they present that pre-training implicitly minimises the intrinsic dimension and that bigger fashions are likely to have decrease intrinsic dimension. Intrinsic dimension is a really related idea for the analysis and design of environment friendly pre-trained fashions, which we coated in an EMNP 2022 tutorial.
- Thoughts Your Outliers! Investigating the Destructive Influence of Outliers on Energetic Studying for Visible Query Answering. This paper investigates the failure of lively studying on VQA. The authors observe that the acquired examples are collective outliers, i.e., teams of examples which are laborious or inconceivable for present fashions. Eradicating such laborious outliers makes issues simpler for lively studying.
- UnNatural Language Inference. This paper modifications the phrase order of NLI sentences to research if fashions “know syntax”. They discover that state-of-the-art NLI fashions are largely invariant to phrase order modifications. They observe that some distributional data (POS neighbourhood) could also be helpful for performing properly within the permuted setup. Unsurprisingly, human annotators wrestle on the permuted sentences.
I additionally loved the next two papers that developed new strategies and frameworks for understanding mannequin behaviour:
Cross-lingual switch and multilingual NLP
Past machine translation, I loved the next papers on cross-lingual switch and multilingual NLP:
Challenges in pure language era
Pure language era (NLG) is among the most difficult settings for NLP. Some papers I loved centered on a number of the challenges of various NLG purposes:
Digital convention notes
Lastly, I need to share some transient notes so as to add to the continuing dialog round a format for digital conferences. I used to be primarily trying ahead to attending the poster periods, as these are normally my spotlight of conferences (along with the social interactions). There have been two in my timezone. Every poster session consisted of numerous tracks being introduced on the similar time, which left significantly much less time to discover and discuss to poster presenters of different areas.
Particular posters have been laborious to seek out as digital poster areas didn’t present the title nor authors of a poster. As well as, area between posters was small in order that audio between posters with massive crowds would get combined.
Sooner or later, I would like to see poster periods which are:
- bigger in quantity and masking only some tracks every;
- unfold out all through the day and timezones;
- straightforward to navigate and with sufficient area between posters.
Two different issues that may have improved my digital convention expertise have been a) a chat system that’s extra seamlessly built-in into the convention platform and b) a tighter integration between the convention platform and the ACL anthology (linking to the papers within the anthology could be good).
Attending the Zoom periods for paper shows went properly and I loved watching the recordings of different talks and keynotes.