Votre recherche
Résultats 11 ressources
-
Word classes are linguistic categories serving as basis in the description of the vocabulary and grammar of natural languages. While important publications are regularly devoted to their definition, identification, and classification, in the field of Romance linguistics we lack a comprehensive, state-of-the-art overview of the current research. This Manual offers an updated and detailed discussion of all relevant aspects related to word classes in the Romance languages. In the first part, word classes are discussed from both a theoretical and historical point of view. The second part of the volume takes as its point of departure single word classes, described transversally in all the main Romance languages, while the third observes the relevant word classes from the point of view of specific Romance(-based) varieties. The fourth part explores Romance word classes at the interface of grammar and other fields of research. The Manual is intended as a reference work for all scholars and students interested in the description of both the standard, major Romance languages and the smaller, lesser described Romance(-based) varieties.
-
Parallel corpora are still scarce for most of the world's language pairs. The situation is by no means different for regional languages of France. In addition, adequate web interfaces facilitate and encourage the use of parallel corpora by target users, such as language learners and teachers, as well as linguists. In this paper, we describe ParCoLab, a parallel corpus and a web platform for querying the corpus. From its onset, ParCoLab has been geared towards lower-resource languages, with an initial corpus in Serbian, along with French and English (later Spanish). We focus here on the extension of ParCoLab with a parallel corpus for four regional languages of France: Alsatian, Corsican, Occitan and Poitevin-Saintongeais. In particular, we detail criteria for choosing texts and issues related to their collection. The new parallel corpus contains more than 20k tokens per regional language.
-
This article describes the creation of corpora with part-of-speech annotations for three regional languages of France: Alsatian, Occitan and Picard. These manual annotations were performed in the context of the RESTAURE project, whose goal is to develop resources and tools for these under-resourced French regional languages. The article presents the tagsets used in the annotation process as well as the resulting annotated corpora.
-
With the support of the DGLFLF, ELDA conducted an inventory of existing language resources for the regional languages of France. The main aim of this inventory was to assess the exploitability of the identified resources within technologies. A total of 2,299 Language Resources were identified. As a second step, a deeper analysis of a set of three language groups (Breton, Occitan, overseas languages) was carried out along with a focus of their exploitability within three technologies: automatic translation, voice recognition/synthesis and spell checkers. The survey was followed by the organisation of the TLRF2015 Conference which aimed to present the state of the art in the field of the Technologies for Regional Languages of France. The next step will be to activate the network of specialists built up during the TLRF conference and to begin the organisation of a second TLRF conference. Meanwhile, the French Ministry of Culture continues its actions related to linguistic diversity and technology, in particular through a project with Wikimedia France related to contributions to Wikipedia in regional languages, the upcoming new version of the “Corpus de la Parole” and the reinforcement of the DGLFLF's Observatory of Linguistic Practices.
-
We present a new major release of the OpenSubtitles collection of parallel corpora. The release is compiled from a large database of movie and TV subtitles and includes a total of 1689 bitexts spanning 2.6 billion sentences across 60 languages. The release also incorporates a number of enhancements in the preprocessing and alignment of the subtitles, such as the automatic correction of OCR errors and the use of meta-data to estimate the quality of each subtitle and score subtitle pairs.
-
This paper presents the current status of OPUS, a growing language resource of parallel corpora and related tools. The focus in OPUS is to provide freely available data sets in various formats together with basic annotation to be useful for applications in computational linguistics, translation studies and cross-linguistic corpus studies. In this paper, we report about new data sets and their features, additional annotation tools and models provided from the website and essential interfaces and on-line services included in the project.
Explorer
Corpus
-
Texte
(7)
-
Annotated
(2)
- Morphology (1)
- Syntax (1)
- Web (5)
-
Annotated
(2)
Langue
-
Multilingue
- Langues COLaF (5)
- Créoles (1)
- Français (1)
- Occitan (1)
Tâche
Type de papier
- Inventaire (2)