Résultats | Bibliographie COLaF

gweltou/breton-tts · Hugging Face. (2025, May 9). https://huggingface.co/gweltou/breton-tts

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Gong, C., Cooper, E., Wang, X., Qiang, C., Geng, M., Wells, D., Wang, L., Dang, J., Tessier, M., Pine, A., Richmond, K., & Yamagishi, J. (2024). An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios (No. arXiv:2406.08911). arXiv. https://doi.org/10.48550/arXiv.2406.08911

Self-supervised learning (SSL) representations from massively multilingual models offer a promising solution for low-resource language speech tasks. Despite advancements, language adaptation in TTS systems remains an open problem. This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system proposed in our previous work. We conducted experiments on 12 languages using limited data with various fine-tuning configurations. We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance. Additionally, we find that the fine-tuning dataset size and number of speakers influence adaptability. Surprisingly, we also observed that using paired data for fine-tuning is not always optimal compared to audio-only data. Beyond speech intelligibility, our analysis covers speaker similarity, language identification, and predicted MOS.

Consulter sur arxiv.org

Lux, F., Meyer, S., Behringer, L., Zalkow, F., Do, P., Coler, M., Habets, E. A. P., & Vu, N. T. (2024, June 10). Meta Learning Text-to-Speech Synthesis in over 7000 Languages. ArXiv.Org. https://arxiv.org/abs/2406.06403v1

In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech synthesis in languages without any available data. We validate our system's performance through objective measures and human evaluation across a diverse linguistic landscape. By releasing our code and models publicly, we aim to empower communities with limited linguistic resources and foster further innovation in the field of speech technology.

Consulter sur arxiv.org

Corral, A., Leturia, I., Séguier, A., Barret, M., Dazéas, B., Boula de Mareüil, P., & Quint, N. (2020). Neural Text-to-Speech Synthesis for an Under-Resourced Language in a Diglossic Environment: the Case of Gascon Occitan. In D. Beermann, L. Besacier, S. Sakti, & C. Soria (Eds.), Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL) (pp. 53–60). European Language Resources association. https://aclanthology.org/2020.sltu-1.8

Occitan is a minority language spoken in Southern France, some Alpine Valleys of Italy, and the Val d'Aran in Spain, which only very recently started developing language and speech technologies. This paper describes the first project for designing a Text-to-Speech synthesis system for one of its main regional varieties, namely Gascon. We used a state-of-the-art deep neural network approach, the Tacotron2-WaveGlow system. However, we faced two additional difficulties or challenges: on the one hand, we wanted to test if it was possible to obtain good quality results with fewer recording hours than is usually reported for such systems; on the other hand, we needed to achieve a standard, non-Occitan pronunciation of French proper names, therefore we needed to record French words and test phoneme-based approaches. The evaluation carried out over the various developed systems and approaches shows promising results with near production-ready quality. It has also allowed us to detect the phenomena for which some flaws or fall of quality occur, pointing at the direction of future work to improve the quality of the actual system and for new systems for other language varieties and voices.

Consulter le document

Votre recherche

Résultats 4 ressources

Explorer

Langue

Tâche