Phone Inventories and Recognition for Every Language

Type de ressource
Conference Paper
Auteurs/contributeurs
Title
Phone Inventories and Recognition for Every Language
Abstract
Identifying phone inventories is a crucial component in language documentation and the preservation of endangered languages. However, even the largest collection of phone inventory only covers about 2000 languages, which is only 1/4 of the total number of languages in the world. A majority of the remaining languages are endangered. In this work, we attempt to solve this problem by estimating the phone inventory for any language listed in Glottolog, which contains phylogenetic information regarding 8000 languages. In particular, we propose one probabilistic model and one non-probabilistic model, both using phylogenetic trees (“language family trees”) to measure the distance between languages. We show that our best model outperforms baseline models by 6.5 F1. Furthermore, we demonstrate that, with the proposed inventories, the phone recognition model can be customized for every language in the set, which improved the PER (phone error rate) in phone recognition by 25%.
Date
2022-06
Proceedings Title
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Conference Name
LREC 2022
Place
Marseille, France
Publisher
European Language Resources Association
Pages
1061–1067
Accessed
08/10/2024 09:25
Library Catalog
ACLWeb
Notes

older paper for transphone

Référence
Li, X., Metze, F., Mortensen, D. R., Black, A. W., & Watanabe, S. (2022). Phone Inventories and Recognition for Every Language. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 1061–1067). European Language Resources Association. https://aclanthology.org/2022.lrec-1.114