Bibliographie complète
The ParCoLab Parallel Corpus and Its Extension to Four Regional Languages of France
Type de ressource
Conference Paper
Auteurs/contributeurs
- Stosic, Dejan (Author)
- Marjanović, Saša (Author)
- Bernhard, Delphine (Author)
- Bras, Myriam (Author)
- Kevers, Laurent (Author)
- Retali-Medori, Stella (Author)
- Vergez-Couret, Marianne (Author)
- Werner, Carole (Author)
- Calzolari, Nicoletta (Editor)
- Kan, Min-Yen (Editor)
- Hoste, Veronique (Editor)
- Lenci, Alessandro (Editor)
- Sakti, Sakriani (Editor)
- Xue, Nianwen (Editor)
Title
The ParCoLab Parallel Corpus and Its Extension to Four Regional Languages of France
Abstract
Parallel corpora are still scarce for most of the world's language pairs. The situation is by no means different for regional languages of France. In addition, adequate web interfaces facilitate and encourage the use of parallel corpora by target users, such as language learners and teachers, as well as linguists. In this paper, we describe ParCoLab, a parallel corpus and a web platform for querying the corpus. From its onset, ParCoLab has been geared towards lower-resource languages, with an initial corpus in Serbian, along with French and English (later Spanish). We focus here on the extension of ParCoLab with a parallel corpus for four regional languages of France: Alsatian, Corsican, Occitan and Poitevin-Saintongeais. In particular, we detail criteria for choosing texts and issues related to their collection. The new parallel corpus contains more than 20k tokens per regional language.
Date
2024-05
Proceedings Title
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Conference Name
LREC-COLING 2024
Place
Torino, Italia
Publisher
ELRA and ICCL
Pages
16014–16023
Accessed
25/05/2024 12:42
Library Catalog
ACLWeb
Référence
Stosic, D., Marjanović, S., Bernhard, D., Bras, M., Kevers, L., Retali-Medori, S., Vergez-Couret, M., & Werner, C. (2024). The ParCoLab Parallel Corpus and Its Extension to Four Regional Languages of France. In N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti, & N. Xue (Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 16014–16023). ELRA and ICCL. https://aclanthology.org/2024.lrec-main.1392
Langue
Lien vers cette notice