Bild

GermEval 2018 : Machine Learning and Neural Network Approaches for Offensive Language Identification

    Pruthwik Mishra, Vandan Mujadia, Soujanya Lanka

konvens 2018 - GermEval Proceedings, pp. 138-143, 2018/10/02

14th Conference on Natural Language Processing - KONVENS 2018


PDF
X
BibTEX-Export:

X
EndNote/Zotero-Export:

X
RIS-Export:

X 
Researchgate-Export (COinS)

Permanent QR-Code

Abstract

Social media has been an effective carrier of information from the day of its inception. People worldwide are able to interact and communicate freely without much of a hassle due to the wide reach of the social media. Though the advantages of this mode of communication are many, the severe drawbacks can not be ignored. One such instance is the rampant use of offensive language in the form of hurtful, derogatory or obscene comments. There is a greater need to employ checks on social media websites to curb the menace of the offensive languages. GermEval Task 2018 1 is an initiative in this direction to automatically identify offensive language in German Twitter posts. In this paper, we describe our approaches for different subtasks in the GermEval Task 2018. Two different kinds of approaches - machine learning and neural network approaches were explored for these subtasks. We observed that character n-grams in Support Vector Machine (SVM) approaches outperformed their neural network counterparts most of the times. The machine learning approaches used TF-IDF features for character n-grams and the neural networks made use of the word embeddings. We submitted the outputs of three runs, all using SVM - one run for Task 1 and two for Task 2.