Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble
Type de ressource
Conference Paper
Auteurs/contributeurs
- Li, Xinjian (Author)
- Metze, Florian (Author)
- Mortensen, David (Author)
- Watanabe, Shinji (Author)
- Black, Alan (Author)
- Muresan, Smaranda (Editor)
- Nakov, Preslav (Editor)
- Villavicencio, Aline (Editor)
Title
Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble
Abstract
Grapheme-to-Phoneme (G2P) has many applications in NLP and speech fields. Most existing work focuses heavily on languages with abundant training datasets, which limits the scope of target languages to less than 100 languages. This work attempts to apply zero-shot learning to approximate G2P models for all low-resource and endangered languages in Glottolog (about 8k languages). For any unseen target language, we first build the phylogenetic tree (i.e. language family tree) to identify top-k nearest languages for which we have training sets. Then we run models of those languages to obtain a hypothesis set, which we combine into a confusion network to propose a most likely hypothesis as an approximation to the target language. We test our approach on over 600 unseen languages and demonstrate it significantly outperforms baselines.
Date
2022-05
Proceedings Title
Findings of the Association for Computational Linguistics: ACL 2022
Conference Name
Findings 2022
Place
Dublin, Ireland
Publisher
Association for Computational Linguistics
Pages
2106–2115
Accessed
07/10/2024 11:41
Library Catalog
ACLWeb
Notes
transphone. generates grapheme-to-phoneme dicts for languages. has breton, occitan, picard (but not alsacian)
https://github.com/xinjli/transphone
used in Meta Learning Text-to-Speech Synthesis in over 7000 Languages
même auteur que allosorus
Référence
Li, X., Metze, F., Mortensen, D., Watanabe, S., & Black, A. (2022). Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble. In S. Muresan, P. Nakov, & A. Villavicencio (Eds.), Findings of the Association for Computational Linguistics: ACL 2022 (pp. 2106–2115). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.findings-acl.166
Lien vers cette notice