Hate Speech Detection - Evalita 2020

HaSpeeDe2@Evalita 2020

The HaSpeeDe 2 (Hate Speech Detection) shared task will be organized within Evalita 2020, the 7th evaluation campaign of Natural Language Processing and Speech tools for Italian, which will be held *anywhere* on the 16th-17th December 2020.

Introduction and Motivation

Online hateful content, or Hate Speech (HS), is characterized by some key aspects (such as virality, or presumed anonymity) which distinguish it from offline communication and make it potentially more dangerous and hurtful. Therefore, its identification has become a crucial mission in many fields.

From an NLP perspective, much attention has been paid to the topic of HS – together with all its possible facets and related phenomena, such as offensive/abusive language, and so on – and its identification. his is shown by the proliferation, especially in the last few years, of contributions on this matter ([9], [3], [11], [12], [16] to name a few), corpora and lexica (e.g. [13], [15], [2]), dedicated workshops, and shared tasks within national (GermEval,HASOC, IberLEF) and international (SemEval) evaluation campaigns (see in particular [1]).

The last edition of EVALITA [6] hosted the first HS detection in Social Media (i.e. HaSpeeDe [4]) task for Italian. The high participation and the promising results encouraged us to propose a second run at EVALITA 2020. We introduce novelties in the task, along three major lines:

Language variety and test of time: We will provide a new HS dataset (binary task) based on Twitter data, accompanied by a test set including both in-domain and out-of-domain data (tweets+news headlines), as well as from different time periods.
Stereotypical communication: an error analysis of the main systems on the HaSpeeDe 2018 dataset itself [10] showed that the occurrence of these elements constitutes a common source of error in HS identification. We want to investigate the use of stereotypes in communication, regardless of their use in hateful context.
Syntactic realisation of HS: on the basis of the POP-HS-IT corpus [7], it appears that the most hateful part of such sentences are often verbless sentences or verbless fragments, also known as nominal utterances (NUs) [8]. We include a task aimed at identifying NUs in hateful messages.

The ultimate goal of this edition of HaSpeeDe is thus to take a step further in the state of the art of HS detection for Italian also exploring other side phenomena, the extent to which they can be distinguished from HS, and finally whether and how much automatic systems are able to draw such distinction.

Task description

This Second Edition focuses on three main phenomena of Hate Speech especially in Twitter that are reflected along three tasks.

Task A - Hate Speech Detection (MAIN TASK): binary classification task aimed at determining whether the message contains Hate Speech or not

Task B - Stereotype Detection (Pilot Task 1): binary classification task aimed at determining whether the message contains Stereotype or not

Task C - Identification of Nominal Utterances (Pilot Task 2): sequence labeling task aimed at recognizing Nominal Utterances in hateful tweets

Participation is allowed to all three tasks (Task A, Task B and Task C), or only two tasks(Task A and Task B or Task A and Task C) or only Task A.

Important dates

29th May 2020: development data available to participants for tasks A and B + a 100-tweet sample data for Task C

~~1st June 2020: complete dataset available for Task C~~
~~4th September 2020~~: registration closes
18th-25th September 2020: (NEW!) evaluation window and collection of participants' results
16th October 2020: deadline for submission of system description papers
6th November 2020: final report of participants due to task organizers (camera-ready)
27th November 2020: (NEW!) videos presentations to the Evalita chair
16th-17th December 2020:(NEW!) final workshop (online)

References

[1] Valerio Basile, Cristina Bosco, Elisabetta Fersini, Debora Nozza, Viviana Patti, Francisco Rangel, Paolo Rosso, and Manuela Sanguinetti. SemEval-2019 Task 5: Multilingual Detectionof Hate Speech Against Immigrants and Women in Twitter. In Proceedings of SemEval 2019, pages 54–63. Association for Computational Linguistics, 2019.

[2] Elisa Bassignana, Valerio Basile, and Viviana Patti. Hurtlex: A Multilingual Lexicon of Wordsto Hurt. InProceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it2018), pages 1–6. CEUR.org, 2018.

[3] Aditya Bohra, Deepanshu Vijay, Vinay Singh, Syed S Akhtar, and Manish Shrivastava. ADataset of Hindi-English Code-Mixed Social Media Text for Hate Speech Detection. InPro-ceedings of the Second Worskshop on Computational Modeling of People’s Opinions, Person-ality, and Emotions in Social Media, pages 36–41. Association for Computational Linguistics(ACL), 2018.

[4] Cristina Bosco, Felice Dell’Orletta, Fabio Poletto, Manuela Sanguinetti, and Maurizio Tesconi.Overview of the EVALITA 2018 Hate Speech Detection Task. InProceedings of the 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA’18),2018.

[5] Arthur TE Capozzi, Mirko Lai, Valerio Basile, Fabio Poletto, Manuela Sanguinetti, CristinaBosco, Viviana Patti, Giancarlo Ruffo, Cataldo Musto, Marco Polignano, et al. Computational linguistics against hate: Hate speech detection and visualization on social media in the"Contro L’Odio" project. In6th Italian Conference on Computational Linguistics, CLiC-it2019, volume 2481, pages 1–6. CEUR-WS, 2019.

[6] Tommaso Caselli, Nicole Novielli, Viviana Patti, and Paolo Rosso. EVALITA 2018: Overviewof the 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian.InProceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Toolsfor Italian. Final Workshop (EVALITA 2018). CEUR.org, 2018.

[7] Gloria Comandini and Viviana Patti. An Impossible Dialogue! Nominal Utterances andPopulist Rhetoric in an Italian Twitter Corpus of Hate Speech against Immigrants. InPro-ceedings of the Third Workshop on Abusive Language Online, pages 163–171. Association forComputational Linguistics, 2019.

[8] Gloria Comandini, Manuela Speranza, and Bernardo Magnini. Effective Communication with-out Verbs? Sure! Identification of Nominal Utterances in Italian Social Media Texts. InPro-ceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), Torino,Italy, December 10-12, 2018, volume 2253 of CEUR Workshop Proceedings. CEUR-WS.org,2018.

[9] Paula Fortuna, João Rocha da Silva, Juan Soler-Company, Leo Wanner, and Sérgio Nunes. A Hierarchically-Labeled Portuguese Hate Speech Dataset. In Proceedings of the Third Workshopon Abusive Language Online, pages 94–104, Florence, Italy, August 2019. Association forComputational Linguistics.

[10] Chiara Francesconi, Cristina Bosco, Fabio Poletto, and Manuela Sanguinetti. Error Analysisin a Hate Speech Detection Task: The case of HaSpeeDe-TW at EVALITA 2018. InCLiC-it,volume 2481 ofCEUR Workshop Proceedings. CEUR-WS.org, 2019.

[11] Lei Gao, Alexis Kuppersmith, and Ruihong Huang. Recognizing Explicit and ImplicitHate Speech Using a Weakly Supervised Two-path Bootstrapping Approach. CoRR,abs/1710.07394:774–782, 2017.

[12] David Jurgens, Eshwar Chandrasekharan, and Libby Hemphill. A Just and Comprehensive Strategy for Using NLP to Address Online Abuse. In Proceedings of the 57th Annual Meetingof the Association for Computational Linguistics, pages 3658–3666. Association for Compu-tational Linguistics (ACL), 2019.

[13] Rogers De Pelle and Viviane P Moreira. Offensive Comments in the Brazilian Web: a Dataset and Baseline Results. InProceedings of the Fifth Brazilian Workshop on Social Network Analysis and Mining (BraSNAM 2016), pages 510–519, 2016.

[14] Fabio Poletto, Marco Stranisci, Manuela Sanguinetti, Viviana Patti, and Cristina Bosco. HateSpeech Annotation: Analysis of an Italian Twitter Corpus. InProceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017). CEUR, december 2017.

[15] Manuela Sanguinetti, Fabio Poletto, Cristina Bosco, Viviana Patti, and Marco Stranisci. An Italian Twitter Corpus of Hate Speech against Immigrants. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’18), pages 2798–2895. European Language Resources Association (ELRA), 2018.

[16] Tommaso Caselli,Valerio Basile, Jelena Mitrović, Inga Kartoziya, Michael Granitzer. I Feel Offended, Don’t Be Abusive! Implicit/Explicit Messages in Offensive and Abusive Language. Proceedings the 12th International Conference on Language Resources and Evaluation (LREC 2020).

HaSpeeDe 2 / Hate Speech Detection

HaSpeeDe2@Evalita 2020

Introduction and Motivation

Task description

Important dates

References

organizers