Building a Universal Dependencies Treebank for Occitan
Type de ressource
Conference Paper
Auteurs/contributeurs
- Miletic, Aleksandra (Author)
- Bras, Myriam (Author)
- Vergez-Couret, Marianne (Author)
- Esher, Louise (Author)
- Poujade, Clamença (Author)
- Sibille, Jean (Author)
- Calzolari, Nicoletta (Editor)
- Béchet, Frédéric (Editor)
- Blache, Philippe (Editor)
- Choukri, Khalid (Editor)
- Cieri, Christopher (Editor)
- Declerck, Thierry (Editor)
- Goggi, Sara (Editor)
- Isahara, Hitoshi (Editor)
- Maegaard, Bente (Editor)
- Mariani, Joseph (Editor)
- Mazo, Hélène (Editor)
- Moreno, Asuncion (Editor)
- Odijk, Jan (Editor)
- Piperidis, Stelios (Editor)
Title
Building a Universal Dependencies Treebank for Occitan
Abstract
This paper outlines the ongoing effort of creating the first treebank for Occitan, a low-ressourced regional language spoken mainly in the south of France. We briefly present the global context of the project and report on its current status. We adopt the Universal Dependencies framework for this project. Our methodology is based on two main principles. Firstly, in order to guarantee the annotation quality, we use the agile annotation approach. Secondly, we rely on pre-processing using existing tools (taggers and parsers) to facilitate the work of human annotators, mainly through a delexicalized cross-lingual parsing approach. We present the results available at this point (annotation guidelines and a sub-corpus annotated with PoS tags and lemmas) and give the timeline for the rest of the work.
Date
2020-05
Proceedings Title
Proceedings of the Twelfth Language Resources and Evaluation Conference
Conference Name
LREC 2020
Place
Marseille, France
Publisher
European Language Resources Association
Pages
2932–2939
Language
English
ISBN
979-10-95546-34-4
Accessed
12/11/2024 09:33
Library Catalog
ACLWeb
Référence
Miletic, A., Bras, M., Vergez-Couret, M., Esher, L., Poujade, C., & Sibille, J. (2020). Building a Universal Dependencies Treebank for Occitan. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 2932–2939). European Language Resources Association. https://aclanthology.org/2020.lrec-1.358
Corpus
Langue
Lien vers cette notice