Modeling Orthographic Variation in Occitan's Dialects

Hopton, Zachary; Aepli, Noëmi

doi:10.18653/v1/2024.vardial-1.6

Bibliographie complète

Retourner à la liste des résultats

1
...
5
6
7
8
9
...
142

Page 7 de 142

Modeling Orthographic Variation in Occitan's Dialects

Consulter le document

Type de ressource

Conference Paper

Auteurs/contributeurs

Hopton, Zachary (Author)
Aepli, Noëmi (Author)
Scherrer, Yves (Editor)
Jauhiainen, Tommi (Editor)
Ljubešić, Nikola (Editor)
Zampieri, Marcos (Editor)
Nakov, Preslav (Editor)
Tiedemann, Jörg (Editor)

Title

Modeling Orthographic Variation in Occitan's Dialects

Abstract

Effectively normalizing spellings in textual data poses a considerable challenge, especially for low-resource languages lacking standardized writing systems. In this study, we fine-tuned a multilingual model with data from several Occitan dialects and conducted a series of experiments to assess the model's representations of these dialects. For evaluation purposes, we compiled a parallel lexicon encompassing four Occitan dialects.Intrinsic evaluations of the model's embeddings revealed that surface similarity between the dialects strengthened representations. When the model was further fine-tuned for part-of-speech tagging, its performance was robust to dialectical variation, even when trained solely on part-of-speech data from a single dialect. Our findings suggest that large multilingual models minimize the need for spelling normalization during pre-processing.

Date

2024-06

Proceedings Title

Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024)

Place

Mexico City, Mexico

Publisher

Association for Computational Linguistics

Pages

78–88

DOI

10.18653/v1/2024.vardial-1.6

URL

https://aclanthology.org/2024.vardial-1.6

Accessed

31/07/2024 15:20

Library Catalog

ACLWeb

Référence

Hopton, Z., & Aepli, N. (2024). Modeling Orthographic Variation in Occitan’s Dialects. In Y. Scherrer, T. Jauhiainen, N. Ljubešić, M. Zampieri, P. Nakov, & J. Tiedemann (Eds.), Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024) (pp. 78–88). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.vardial-1.6

Langue

Occitan

Document

Hopton et Aepli - 2024 - Modeling Orthographic Variation in Occitan's Diale.pdf

Lien vers cette notice

https://colaf.huma-num.fr/bibliography/X4BR3RRU

1
...
5
6
7
8
9
...
142

Page 7 de 142