Lexical Normalization of Spanish Tweets with Preprocessing Rules, Domain-Specific Edit Distances, and Language Models
العنوان: | Lexical Normalization of Spanish Tweets with Preprocessing Rules, Domain-Specific Edit Distances, and Language Models |
---|---|
المؤلفون: | Pablo Ruiz Fabo, Montse Cuadros, Thierry Etchegoyhen |
المساهمون: | Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094 (Lattice), Département Littératures et langage - ENS Paris (LILA), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-Université Sorbonne Paris Cité (USPC)-Université Sorbonne Nouvelle - Paris 3, VicomTech |
المصدر: | Proceedings of the Tweet Normalization Workshop at SEPLN 2013. IV Congreso Español de Informática Proceedings of the Tweet Normalization Workshop at SEPLN 2013. IV Congreso Español de Informática, Sep 2013, Madrid, Spain Scopus-Elsevier HAL |
بيانات النشر: | HAL CCSD, 2013. |
سنة النشر: | 2013 |
مصطلحات موضوعية: | [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing, Twitter, [INFO.INFO-WB]Computer Science [cs]/Web, language model, edit distance, lexical normalization, Spanish microtext, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] |
الوصف: | International audience; We present a system to normalize Spanish tweets, which uses preprocessing rules, a domain-appropriate edit-distance model, and language models to select correction candidates based on context. The system's results at SEPLN 2013 Tweet-Norm task were above-average. |
اللغة: | English |
URL الوصول: | https://explore.openaire.eu/search/publication?articleId=dedup_wf_001::421d8ea40fcfb52df08411e1e9940585 https://hal.archives-ouvertes.fr/hal-01099250/document |
رقم الانضمام: | edsair.dedup.wf.001..421d8ea40fcfb52df08411e1e9940585 |
قاعدة البيانات: | OpenAIRE |
الوصف غير متاح. |