Lexical Normalization of Spanish Tweets with Preprocessing Rules, Domain-Specific Edit Distances, and Language Models

التفاصيل البيبلوغرافية
العنوان: Lexical Normalization of Spanish Tweets with Preprocessing Rules, Domain-Specific Edit Distances, and Language Models
المؤلفون: Pablo Ruiz Fabo, Montse Cuadros, Thierry Etchegoyhen
المساهمون: Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094 (Lattice), Département Littératures et langage - ENS Paris (LILA), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université Sorbonne Paris Cité (USPC)-Université Sorbonne Nouvelle - Paris 3, VicomTech
المصدر: Proceedings of the Tweet Normalization Workshop at SEPLN 2013. IV Congreso Español de Informática
Proceedings of the Tweet Normalization Workshop at SEPLN 2013. IV Congreso Español de Informática, Sep 2013, Madrid, Spain
Scopus-Elsevier
HAL
بيانات النشر: HAL CCSD, 2013.
سنة النشر: 2013
مصطلحات موضوعية: [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing, Twitter, [INFO.INFO-WB]Computer Science [cs]/Web, language model, edit distance, lexical normalization, Spanish microtext, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
الوصف: International audience; We present a system to normalize Spanish tweets, which uses preprocessing rules, a domain-appropriate edit-distance model, and language models to select correction candidates based on context. The system's results at SEPLN 2013 Tweet-Norm task were above-average.
اللغة: English
URL الوصول: https://explore.openaire.eu/search/publication?articleId=dedup_wf_001::421d8ea40fcfb52df08411e1e9940585
https://hal.archives-ouvertes.fr/hal-01099250/document
رقم الانضمام: edsair.dedup.wf.001..421d8ea40fcfb52df08411e1e9940585
قاعدة البيانات: OpenAIRE