Votre recherche

Réinitialiser la recherche

Dans les auteurs ou contributeurs

"Vergez-Couret, Marianne"

Tâche

Annotation morpho-syntaxique

Résultats 3 ressources

Résumés

Vergez-Couret, M., & Urieli, A. (2014). Pos-tagging different varieties of Occitan with single-dialect resources. In M. Zampieri, L. Tan, N. Ljubešić, & J. Tiedemann (Eds.), Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (pp. 21–29). Association for Computational Linguistics and Dublin City University. https://doi.org/10.3115/v1/W14-5303

Consulter le document
Garcia Holgado, C., & Vergez-Couret, M. (2024). Empowering Low-Resource Regional Languages with Lexicons : A Comparative Study of NLP Tools for Morphosyntactic Analysis. In N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti, & N. Xue (Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 5747–5756). ELRA and ICCL. https://aclanthology.org/2024.lrec-main.510

We investigate the effect of integrating lexicon information to an extremely low-resource language when annotated data is scarce for morpho-syntactic analysis. Obtaining such data and linguistic resources for these languages are usually constrained by a lack of human and financial resources making this task particularly challenging. In this paper, we describe the collection and leverage of a bilingual lexicon for Poitevin-Saintongeais, a regional language of France, to create augmented data through a neighbor-based distributional method. We assess this lexicon-driven approach in improving POS tagging while using different lexicon and augmented data sizes. To evaluate this strategy, we compare two distinct paradigms: neural networks, which typically require extensive data, and a conventional probabilistic approach, in which a lexicon is instrumental in its performance. Our findings reveal that the lexicon is a valuable asset for all models, but in particular for neural, demonstrating an enhanced generalization across diverse classes without requiring an extensive lexicon size.

Consulter le document
Bernhard, D., Ligozat, A.-L., Martin, F., Bras, M., Magistry, P., Vergez-Couret, M., Steiblé, L., Erhart, P., Hathout, N., Huck, D., Rey, C., Reynés, P., Rosset, S., Sibille, J., & Lavergne, T. (2018, May). Corpora with Part-of-Speech Annotations for Three Regional Languages of France: Alsatian, Occitan and Picard. 11th Edition of the Language Resources and Evaluation Conference. https://hal.science/hal-01704806

This article describes the creation of corpora with part-of-speech annotations for three regional languages of France: Alsatian, Occitan and Picard. These manual annotations were performed in the context of the RESTAURE project, whose goal is to develop resources and tools for these under-resourced French regional languages. The article presents the tagsets used in the annotation process as well as the resulting annotated corpora.

Consulter le document

Flux web personnalisé

Dernière mise à jour depuis la base de données : 23/06/2025 15:08 (UTC)

Votre recherche

Résultats 3 ressources

Explorer

Corpus

Langue

Tâche