konvens 2018 - GermEval Proceedings, pp. 113-119, 2018/10/02
14th Conference on Natural Language Processing - KONVENS 2018
This paper describes the entry hshl coarse 1.txt for Task I (Binary Classification) of the Germeval Task 2018 - Shared Task on the Identification of Offensive Language. For this task, German tweets were classified as either offensive or non-offensive. The entry employs a task-specific classifier built on top of a medium-specific language model which is built on top of a universal language model. The approach uses a deep recurrent neural network, specifically the AWD-LSTM architecture. The universal language model was trained on 100 million unlabeled articles from the German Wikipedia and the medium-specific language model was trained on 303,256 unlabeled tweets. The classifier was trained on the labeled tweets that were provided by the organizers of the shared task.