Votre recherche
Résultats 10 ressources
-
In this position paper we argue that researchers interested in language and/or language technologies should attend to challenges of linguistic and algorithmic injustice together with language communities. We put forward that this can be done by drawing together diverse scholarly and experiential insights, building strong interdisciplinary teams, and paying close attention to the wider social, cultural and historical contexts of both language communities and the technologies we aim to develop.
-
This paper is a position paper concerning corpus-building strategies in minoritized languages in the Global North. It draws attention to the structure of the non-technical community of speakers, and concretely addresses how their needs can inform the design of technical solutions. Celtic Breton is taken as a case study for its relatively small speaker community, which is rather well-connected to modern technical infrastructures, and is bilingual with a non-English language (French). I report on three different community internal initiatives that have the potential to facilitate the growth of NLP-ready corpora in FAIR practices (Findability, Accessibility, Interoperability, Reusability). These initiatives follow a careful analysis of the Breton NLP situation both inside and outside of academia, and take advantage of preexisting dynamics. They are integrated to the speaking community, both on small and larger scales. They have in common the goal of creating an environment that fosters virtuous circles, in which various actors help each other. It is the interactions between these actors that create qualityenriched corpora usable for NLP, once some low-cost technical solutions are provided. This work aims at providing an estimate of the community’s internal potential to grow its own pool of resources, provided the right NLP resource gathering tools and ecosystem design. Some projects reported here are in the early stages of conception, while others build on decade-long society/research interfaces for the building of resources. All call for feedback from both NLP researchers and the speaking communities, contributing to building bridges and fruitful collaborations between these two groups.
-
This chapter presents a survey of the current state of technologies for the automatic processing of the French language. It is based on a thorough analysis of existing tools and resources for French, and also provides an accurate presentation of the domain and its main stakeholders (Adda et al. 2022). The chapter documents the presence of French on the internet and describes in broad terms the existing technologies for the French language. It also spells out general conclusions and formulates recommendations for progress towards deep language understanding for French.
-
Dans l’esprit d’une majorité de Français, les langues dites régionales ne seraient que des « patois », de vulgaires déformations du français, de vagues idiomes tout juste bons à décrire des banalités. Pourquoi devraient-ils s’émouvoir de leur effacement ? Or, tous les linguistes le savent : le basque, le breton, l’alsacien, le corse, le picard et les autres, n’ont rien à envier au français, à l’anglais, à l’arabe ou au mandarin. La seule différence entre les « petites langues » et les autres, c’est que les premières n’ont pas eu la chance de devenir des langues officielles d’un État. Cet ouvrage affiche une ambition assumée : réconcilier la France avec sa diversité. Pour que le français reste notre langue commune, sans devenir notre langue unique.
-
Languages are classified as low-resource when they lack the quantity of data necessary for training statistical and machine learning tools and models. Causes of resource scarcity vary but can include poor access to technology for developing these resources, a relatively small population of speakers, or a lack of urgency for collecting such resources in bilingual populations where the second language is high-resource. As a result, the languages described as low-resource in the literature are as different as Finnish on the one hand, with millions of speakers using it in every imaginable domain, and Seneca, with only a small-handful of fluent speakers using the language primarily in a restricted domain. While issues stemming from the lack of resources necessary to train models unite this disparate group of languages, many other issues cut across the divide between widely-spoken low-resource languages and endangered languages. In this position paper, we discuss the unique technological, cultural, practical, and ethical challenges that researchers and indigenous speech community members face when working together to develop language technology to support endangered language documentation and revitalization. We report the perspectives of language teachers, Master Speakers and elders from indigenous communities, as well as the point of view of academics. We describe an ongoing fruitful collaboration and make recommendations for future partnerships between academic researchers and language community stakeholders.
-
Le picard apparaît comme une langue très proche du français, et les effets de la proximité des langues sur leur enseignement devraient s’observer, dans son cas, de façon plus sensible que dans le cas, par exemple, de l’enseignement du russe à des tchécophones. Néanmoins, l’examen des trois manuels de picard existants, ainsi que les opinions exprimées par les picardophones eux-mêmes comme par les instances ministérielles, peuvent faire craindre que la très grande proximité du picard au français annihile la possibilité même de son enseignement, par un effet d’évaporation de son objet en tant que langue. Pourtant, dans une conception de l’enseignement des langues comme transmission d’une compétence de communication, le picard peut retrouver sa place en tant que pôle secondaire coorganisateur de la diglossie au sein de l’espace discursif régional. Cette approche implique néanmoins un travail de (re)création d’un référentiel normatif picard, destiné à le rendre visible aux yeux des locuteurs/apprenants, et donc apte à organiser cet espace discursif comme diglossie.
-
\textbar En vue de la signature de la Charte européenne des langues régionales ou minoritaires, proposition d'une liste de langues susceptibles d'être inscrites comme bénéficiaires de la Charte. Le rapport présente une liste de 75 langues parlées par des ressortissants français sur le territoire de la République.
Explorer
Corpus
-
Texte
(1)
-
Annotated
(1)
- Morphology (1)
-
Annotated
(1)