Bild

Challenges of Automatically Detecting Offensive Language Online: Participation Paper for the Germeval Shared Task 2018 ( H a UA )

    Tom De Smedt, Sylvia Jaki

konvens 2018 - GermEval Proceedings, pp. 27-32, 2018/10/02

14th Conference on Natural Language Processing - KONVENS 2018


PDF
X
BibTEX-Export:

X
EndNote/Zotero-Export:

X
RIS-Export:

X 
Researchgate-Export (COinS)

Permanent QR-Code

Abstract

This paper presents our submission (HaUA) for Germeval Shared Task 1 (Binary Classification) on the identification of offensive language. With feature selection and features such as character ngrams, offensive word lexicons, and sentiment polarity, our SVM classifier is able to distinguish between offensive and nonoffensive Germanlanguage tweets with an indomain F1 score of 88.9%. In this paper, we report our methodology and discuss machine learning problems such as imbalance, overfitting, and the interpretability of machine learning algorithms. In the discussion section, we also briefly go beyond the technical perspectives and argue for a thorough discussion of the dilemma between internet security and freedom of speech, and what kind of language we are actually predicting with such algorithms.