COLaF

COLaF

Corpus and Tools for the Languages of France

COLaF

Through the COLaF project (Corpus et Outils pour les Langues de France, Corpus and Tools for the Languages of France), Inria aims to contribute to the development of free corpora and tools for French and other languages of France, in close collaboration with academic and institutional partners.

The scope of COLaF includes both:

  • Text data (ALMAnaCH Inria Paris Centre),
  • Speech and sign language data (MULTISPEECH, Inria Centre at the University of Lorraine).

COLaF aims to cover French and the languages of France in all its diversity:

  • It aims to have a coverage as diverse as possible: French from France and elsewhere, regional languages, French-based creoles (including outside France), indigenous languages, migrant languages, French sign language.
  • All aspects of variation will be studied, beyond the standard state of the language, including specialised languages, diachrony, non-standard states (user-generated content, learner language, etc.).

Activity within the project notably covers the acquisition and structuring of texts from non-textual sources (books, audio recordings, etc.), the classification by language and linguistic variety of large volumes of texts (in close connection with the OSCAR project), the development of annotation and transformation models (translation, normalisation, voice synthesis, sign language generation) serving the development of corpora and the exploitation of newly created resources.

COLaF is an Inria DEFI led by Benoît Sagot (head of the ALMAnaCH project team) and Slim Ouni (head of the MULTISPEECH project team).

Core Team

Avatar

ALMAnaCH

Text data

Avatar

Multispeech

Speech and Sign Language Data

Partners laboratories and institutions

Avatar

Agence régionale de la Langue Picarde

Picard

Avatar

Lo Congrès

Occitan

The Team

ALMAnaCH

Avatar

Benoît Sagot

Senior Researcher - Co-Leader

Avatar

Thibault Clérice

Starting Faculty Position -Project manager

Avatar

Rachel Bawden

Researcher

Avatar

Djamé Seddah

University Professor

Avatar

Rasul Dent

PhD Student

Avatar

Oriane Nédey

Engineer

Avatar

Juliette Janès

Engineer

Avatar

Laurent Romary

Senior Researcher

Multispeech

Avatar

Slim Ouni

University Professor - Co-Leader

Avatar

Sam Bigeard

Engineer - Project manager

Avatar

Mostafa Sadeghi

Researcher

Avatar

Emmanuel Vincent

Senior Researcher

Avatar

Vincent Colotte

University Professor

Results

The encoding scheme of COLAF-Text is available here.

Contact