Votre recherche

Réinitialiser la recherche

Dans les auteurs ou contributeurs

"Erhart, Pascale"

Corpus

Texte
- Annotated
  - Morphology

Résultats 2 ressources

Résumés

Bernhard, D., Erhart, P., Huck, D., & Steiblé, L. (2023). Annotated Corpus for the Alsatian Dialects (Version 3.0). Zenodo. https://doi.org/10.5281/zenodo.10132307

This corpus contains a collection of texts in the Alsatian dialects which were manually annotated with parts-of-speech, lemmas, translations into French and location entities. The corpus was produced in the context of the RESTAURE project, funded by the French ANR. The current version of the corpus contains 21 documents and 12,907 syntactic words. The annotation process is detailed in the following article: http://hal.archives-ouvertes.fr/hal-01704806 Information about version 3 Version 3 corrects some minor errors in the CONLL-U files: wrong token indexes after multiword tokens and missing _ in glosses. In addition, all files are concatenated into a single CONLL-U file. Information about version 2 Version 2 contains the same annotated documents as version 1, but some errors have been corrected and the annotated corpus is provided in the CoNLL-U format The untokenised and unannotated versions of the documents are found in the "txt" folder. The annotated versions of the documents are found in the "ud" folder (CoNLL-U format). In addition to the form, the lemma and the part-of-speech additional information is also provided: translation of the lemma into French (Gloss field) annotation of location names (NamedType field)

Consulter sur zenodo.org
Bernhard, D., Ligozat, A.-L., Martin, F., Bras, M., Magistry, P., Vergez-Couret, M., Steiblé, L., Erhart, P., Hathout, N., Huck, D., Rey, C., Reynés, P., Rosset, S., Sibille, J., & Lavergne, T. (2018, May). Corpora with Part-of-Speech Annotations for Three Regional Languages of France: Alsatian, Occitan and Picard. 11th Edition of the Language Resources and Evaluation Conference. https://hal.science/hal-01704806

This article describes the creation of corpora with part-of-speech annotations for three regional languages of France: Alsatian, Occitan and Picard. These manual annotations were performed in the context of the RESTAURE project, whose goal is to develop resources and tools for these under-resourced French regional languages. The article presents the tagsets used in the annotation process as well as the resulting annotated corpora.

Consulter le document

Flux web personnalisé

Dernière mise à jour depuis la base de données : 23/06/2025 15:08 (UTC)

Votre recherche

Résultats 2 ressources

Explorer

Corpus

Langue

Tâche