Bild

upInf - Offensive Language Detection in German Tweets

    Bastian Birkeneder, Jelena Mitrovic, Julia Niemeier, Leon Teubert, Siegfried Handschuh

konvens 2018 - GermEval Proceedings, pp. 71-79, 2018/10/02

14th Conference on Natural Language Processing - KONVENS 2018


PDF
X
BibTEX-Export:

X
EndNote/Zotero-Export:

X
RIS-Export:

X 
Researchgate-Export (COinS)

Permanent QR-Code

Abstract

As part of the shared task of GermEval 2018 we developed a system that is able to detect offensive speech in German tweets. To increase the size of the existing training set we made an application for gathering trending tweets in Germany. This application also assists in manual annotation of those tweets. The main part of the training data consists of the set provided by the organizers of the shared task. We implement three different models. The first one follows the n-gram approach. The second model utilizes word vectors to create word clusters which contributes to a new array of features. Our last model is a composition of a recurrent and a convolutional neural network. We evaluate our approaches by splitting the given data into train, validation and test sets. The final evaluation is done by the organizers of the task who compare our predicted results with the unpublished ground truth.