Josef Ruppendorfer - Melanie Siegel - Michael Wiegand (Hrsg.)

Proceedings of the GermEval 2018 Workshop

14th Conference on Natural Language Processing - KONVENS 2018

Verlag der Österreichischen Akademie der Wissenschaften
Austrian Academy of Sciences Press

A-1011 Wien, Dr. Ignaz Seipel-Platz 2
Tel. +43-1-515 81/DW 3420, Fax +43-1-515 81/DW 3400
https://verlag.oeaw.ac.at, e-mail: verlag@oeaw.ac.at

Die Konferenz zur Verarbeitung Natürlicher Sprache (KONVENS) soll den Erfahrungsaustausch auf hohem Niveau durch die Vorstellung computerlinguistischer Grundlagenforschung und ausgewählte Praxisvorträge von Experten befördern.

Copyright Cover: Melanie Siegel

Bestellung/Order

Proceedings of the GermEval 2018 Workshop

ISBN 978-3-7001-8435-5
Online Edition

Send or fax to your local bookseller or to:

Verlag der Österreichischen Akademie der Wissenschaften
Austrian Academy of Sciences Press

A-1011 Wien, Dr. Ignaz Seipel-Platz 2,
Tel. +43-1-515 81/DW 3420, Fax +43-1-515 81/DW 3400
https://verlag.oeaw.ac.at, e-mail: bestellung.verlag@oeaw.ac.at
UID-Nr.: ATU 16251605, FN 71839x Handelsgericht Wien, DVR: 0096385

Bitte senden Sie mir
Please send me

Exemplar(e) der genannten Publikation
copy(ies) of the publication overleaf

NAME

ADRESSE / ADDRESS

ORT / CITY

LAND / COUNTRY

ZAHLUNGSMETHODE / METHOD OF PAYMENT

Visa Euro / Master American Express

NUMMER

Ablaufdatum / Expiry date:

I will send a cheque Vorausrechnung / Send me a proforma invoice

DATUM, UNTERSCHRIFT / DATE, SIGNATURE

BANK AUSTRIA CREDITANSTALT, WIEN (IBAN AT04 1100 0006 2280 0100, BIC BKAUATWW), DEUTSCHE BANK MÜNCHEN (IBAN DE16 7007 0024 0238 8270 00, BIC DEUTDEDBMUC)

TUWienKBS at GermEval 2018: German Abusive Tweet Detection

Joaquín Padilla Montani,

Peter Schüller

konvens 2018 - GermEval Proceedings, pp. 45-50, 2018/10/02

14th Conference on Natural Language Processing - KONVENS 2018

PDF

Cite

X
BibTEX-Export:

X
EndNote/Zotero-Export:

X
RIS-Export:

Researchgate-Export (COinS)

Permanent QR-Code

Abstract

The TUWienKBS system for abusive tweet detection in the GermEval 2018 competition is a stacked classifier. Five disjoint sets of features are used: token and character n-grams, relatedness to the, according to TFIDF, most important tokens and character n-grams within each class, and the average of the embedding vectors of all tokens in a tweet. Three base classifiers (maximum entropy and two random forest ensembles) are trained independently on each of these features, which yields 15 predictions for the type and/or level of abusiveness of the given tweets. One maximum entropy meta-level classifier performs the final classification. As word embedding fallback for out-of-vocabulary tokens we use the embeddings of the largest prefix and suffix of the token, if such embeddings can be found.

Online Edition Table of Contents

Published Online: 2018/10/02 12:00:00

Object Identifier: 0xc1aa5572 0x003a10e0

Document viewed: Calculating...

Josef Ruppendorfer - Melanie Siegel - Michael Wiegand (Hrsg.)

Proceedings of the GermEval 2018 Workshop

14th Conference on Natural Language Processing - KONVENS 2018

TUWienKBS at GermEval 2018: German Abusive Tweet Detection

Abstract