Bibliographie complète
Tokenization for Occitan (Gascon and Lengadocian)
Type de ressource
Software
Auteurs/contributeurs
- Vergez-Couret, Marianne (Programmer)
- Miletic, Aleksandra (Programmer)
Title
Tokenization for Occitan (Gascon and Lengadocian)
Abstract
A python programme to tokenise texts in Occitan based on rules.
To launch the programme, execute the following instruction:
python3 tokenizer_occitan.py < input.txt > output.conllu
The script takes as input a text file with a single sentence per line, starting by a sentence ID, followed by a tab character, followed by the sentence itself.
The current version of the tool was developped during the projects DIVITAL (funded by the ANR) and CorCoDial (funded by the Academy of Finland).
Date
2024-06-24
Company
Zenodo
Library Catalog
Zenodo
Accessed
25/06/2024 09:43
Extra
Référence
Vergez-Couret, M., & Miletic, A. (2024). Tokenization for Occitan (Gascon and Lengadocian). Zenodo. https://doi.org/10.5281/zenodo.12515136
Langue
Tâche
Lien vers cette notice