Lemmatization Experiments on Two Low-Resourced Languages: Low Saxon and Occitan

Miletić, Aleksandra; Siewert, Janine

doi:10.18653/v1/2023.vardial-1.17

Bibliographie complète

Retourner à la liste des résultats

1
...
51
52
53
54
55
...
164

Page 53 de 164

Lemmatization Experiments on Two Low-Resourced Languages: Low Saxon and Occitan

Consulter le document

Type de ressource

Conference Paper

Auteurs/contributeurs

Miletić, Aleksandra (Author)
Siewert, Janine (Author)

Title

Lemmatization Experiments on Two Low-Resourced Languages: Low Saxon and Occitan

Abstract

We present lemmatization experiments on the unstandardized low-resourced languages Low Saxon and Occitan using two machine-learningbased approaches represented by MaChAmp and Stanza. We show different ways to increase training data by leveraging historical corpora, small amounts of gold data and dictionary information, and discuss the usefulness of this additional data. In the results, we find some differences in the performance of the models depending on the language. This variation is likely to be partly due to differences in the corpora we used, such as the amount of internal variation. However, we also observe common tendencies, for instance that sequential models trained only on gold-annotated data often yield the best overall performance and generalize better to unknown tokens.

Date

2023

Proceedings Title

Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023)

Conference Name

Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023)

Place

Dubrovnik, Croatia

Publisher

Association for Computational Linguistics

Pages

163-173

Language

en

DOI

10.18653/v1/2023.vardial-1.17

Short Title

Lemmatization Experiments on Two Low-Resourced Languages

URL

https://aclanthology.org/2023.vardial-1.17

Accessed

13/05/2024 09:03

Library Catalog

DOI.org (Crossref)

Référence

Miletić, A., & Siewert, J. (2023). Lemmatization Experiments on Two Low-Resourced Languages: Low Saxon and Occitan. Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023), 163–173. https://doi.org/10.18653/v1/2023.vardial-1.17

Langue

Occitan

Tâche

Annotation morpho-syntaxique

Document

Miletić et Siewert - 2023 - Lemmatization Experiments on Two Low-Resourced Lan.pdf

Lien vers cette notice

https://colaf.huma-num.fr/bibliography/TAAUIK2L

1
...
51
52
53
54
55
...
164

Page 53 de 164