Managing Fine-grained Metadata for Text Bases in Extremely Low Resource Languages: The Cases of Two Regional Languages of France

Type de ressource
Conference Paper
Auteurs/contributeurs
Title
Managing Fine-grained Metadata for Text Bases in Extremely Low Resource Languages: The Cases of Two Regional Languages of France
Abstract
Metadata are key components of language resources and facilitate their exploitation and re-use. Their creation is a labour intensive process and requires a modeling step, which identifies resource-specific information as well as standards and controlled vocabularies that can be reused. In this article, we focus on metadata for documenting text bases for regional languages of France characterised by several levels of variation (space, time, usage, social status), based on a survey of existing metadata schema. Moreover, we implement our metadata model as a database structure for the Heurist data management system, which combines both the ease of use of spreadsheets and the ability to model complex relationships between entities of relational databases. The Heurist template is made freely available and was used to describe metadata for text bases in Alsatian and Poitevin-Santongeais. We also propose tools to automatically generate XML metadata headers files from the database.
Date
2024-05
Proceedings Title
Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024
Place
Torino, Italia
Publisher
ELRA and ICCL
Pages
212–221
Short Title
Managing Fine-grained Metadata for Text Bases in Extremely Low Resource Languages
Accessed
02/08/2024 13:55
Library Catalog
ACLWeb
Référence
Vergez-Couret, M., Bernhard, D., Nauge, M., Bras, M., Ruiz Fabo, P., & Werner, C. (2024). Managing Fine-grained Metadata for Text Bases in Extremely Low Resource Languages: The Cases of Two Regional Languages of France. In M. Melero, S. Sakti, & C. Soria (Eds.), Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024 (pp. 212–221). ELRA and ICCL. https://aclanthology.org/2024.sigul-1.25