Votre recherche

Réinitialiser la recherche

Type de papier

Etat de l'art

Résultats 3 ressources

Résumés

Bernhard, D., Vergez-Couret, M., & Dupuy, E. (2024). Au-delà des normes : identifier et documenter les langues minorisées pour le traitement automatique des langues. Cahiers Du Plurilinguisme Européen, 16. https://doi.org/10.57086/cpe.1710

Dieser Artikel stellt Überlegungen zu den Herausforderungen der Dokumentation von Minderheitensprachen im digitalen Raum an, ausgehend von den Arbeiten, die im Rahmen des DIVITAL-Projekts durchgeführt wurden. Die ersten Arbeiten des Projekts betrafen die Sammlung von Korpora und ihre Dokumentation durch feinkörnige Metadaten. Diese Arbeiten haben zwei große Herausforderungen aufgezeigt: (i) die Identifizierung der Sprachen und ihrer Varianten im Rahmen der Normen für die Kodierung von Sprachnamen und (ii) die Schaffung neuer Ressourcen in Verbindung mit der aktuellen Praxis dieser Sprachen.

Consulter le document
Çelikkol, M., Körber, L., & Zhao, W. (2024). Exploring Diachronic and Diatopic Changes in Dialect Continua: Tasks, Datasets and Challenges (No. arXiv:2407.04010). arXiv. https://doi.org/10.48550/arXiv.2407.04010

Everlasting contact between language communities leads to constant changes in languages over time, and gives rise to language varieties and dialects. However, the communities speaking non-standard language are often overlooked by non-inclusive NLP technologies. Recently, there has been a surge of interest in studying diatopic and diachronic changes in dialect NLP, but there is currently no research exploring the intersection of both. Our work aims to fill this gap by systematically reviewing diachronic and diatopic papers from a unified perspective. In this work, we critically assess nine tasks and datasets across five dialects from three language families (Slavic, Romance, and Germanic) in both spoken and written modalities. The tasks covered are diverse, including corpus construction, dialect distance estimation, and dialect geolocation prediction, among others. Moreover, we outline five open challenges regarding changes in dialect use over time, the reliability of dialect datasets, the importance of speaker characteristics, limited coverage of dialects, and ethical considerations in data collection. We hope that our work sheds light on future research towards inclusive computational methods and datasets for language varieties and dialects.

Consulter le document
Joshi, A., Dabre, R., Kanojia, D., Li, Z., Zhan, H., Haffari, G., & Dippold, D. (2024, January 11). Natural Language Processing for Dialects of a Language: A Survey. ArXiv.Org. https://arxiv.org/abs/2401.05632v3

State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of a language. Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches. We describe a wide range of NLP tasks in terms of two categories: natural language understanding (NLU) (for tasks such as dialect classification, sentiment analysis, parsing, and NLU benchmarks) and natural language generation (NLG) (for summarisation, machine translation, and dialogue systems). The survey is also broad in its coverage of languages which include English, Arabic, German among others. We observe that past work in NLP concerning dialects goes deeper than mere dialect classification, and . This includes early approaches that used sentence transduction that lead to the recent approaches that integrate hypernetworks into LoRA. We expect that this survey will be useful to NLP researchers interested in building equitable language technologies by rethinking LLM benchmarks and model architectures.

Consulter le document

Flux web personnalisé

Dernière mise à jour depuis la base de données : 23/06/2025 15:08 (UTC)

Votre recherche

Résultats 3 ressources

Explorer

Langue

Type de papier