Bibliographie complète
Modeling Orthographic Variation in Occitan's Dialects
Type de ressource
Conference Paper
Auteurs/contributeurs
- Hopton, Zachary (Author)
- Aepli, Noëmi (Author)
- Scherrer, Yves (Editor)
- Jauhiainen, Tommi (Editor)
- Ljubešić, Nikola (Editor)
- Zampieri, Marcos (Editor)
- Nakov, Preslav (Editor)
- Tiedemann, Jörg (Editor)
Title
Modeling Orthographic Variation in Occitan's Dialects
Abstract
Effectively normalizing spellings in textual data poses a considerable challenge, especially for low-resource languages lacking standardized writing systems. In this study, we fine-tuned a multilingual model with data from several Occitan dialects and conducted a series of experiments to assess the model's representations of these dialects. For evaluation purposes, we compiled a parallel lexicon encompassing four Occitan dialects.Intrinsic evaluations of the model's embeddings revealed that surface similarity between the dialects strengthened representations. When the model was further fine-tuned for part-of-speech tagging, its performance was robust to dialectical variation, even when trained solely on part-of-speech data from a single dialect. Our findings suggest that large multilingual models minimize the need for spelling normalization during pre-processing.
Date
2024-06
Proceedings Title
Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024)
Place
Mexico City, Mexico
Publisher
Association for Computational Linguistics
Pages
78–88
Accessed
31/07/2024 15:20
Library Catalog
ACLWeb
Référence
Hopton, Z., & Aepli, N. (2024). Modeling Orthographic Variation in Occitan’s Dialects. In Y. Scherrer, T. Jauhiainen, N. Ljubešić, M. Zampieri, P. Nakov, & J. Tiedemann (Eds.), Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024) (pp. 78–88). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.vardial-1.6
Langue
Lien vers cette notice