Every two weeks, somewhere on Earth, a language dies. With it disappears an entire way of perceiving the world and human history accumulated over centuries. The UNESCO Atlas of the World’s Languages in Danger lists over 2,500 languages as vulnerable, endangered, or critically endangered. Of the approximately 7,000 languages spoken today, linguists estimate that half could fall silent by the end of this century.
Artificial intelligence, machine learning, and digital archiving are emerging as powerful tools for documentation and transmission of threatened languages. Yet technology alone is not enough. What is at stake is not only the survival of sounds and grammars, but the preservation of an irreplaceable portion of humanity’s shared patrimony and the recognition of communities’ entitlement to protect it.
The Digital Turn in Language Preservation
For decades, language documentation relied on fieldwork. Linguists travelled to remote communities, recording conversations on cassette tapes and transcribing them by hand. The digital revolution has fundamentally altered this equation. Platforms such as ELAR at SOAS University of London and the PARADISEC archive in Australia now allow communities to deposit, manage, and access recordings. These multimedia materials are managed with access rights negotiated directly with speaker communities. This acknowledges their ownership over their own cultural data.
Artificial intelligence is opening new frontiers. Speech recognition models, automatic transcription tools, and machine translation systems are being developed. These tools now support languages that previously had no digital presence.
The Endangered Languages Project, supported by Google, has helped document over 3,500 languages. Mozilla’s Common Voice initiative has also begun collecting speech data. It covers dozens of minority languages. This enables voice technology in Breton, Basque, Kabyle, and Irish.
Guardians of a Living Heritage
The contemporary landscape of language preservation is striking. What stands out is the depth of personal and collective commitment. Individuals drive this work behind every archive and every recorded elder. These people have chosen to make linguistic heritage a defining purpose of their lives. They often work without institutional support. Sometimes, they work against the indifference of the societies around them.
In Hawaii, a handful of families made a radical decision in the 1980s. They chose to raise their children exclusively in Hawaiian. At the time, the language had dwindled to fewer than a thousand speakers. The Pūnana Leo immersion school movement grew from that grassroots commitment. Today, it educates thousands of children. It has reversed what once seemed like an inevitable extinction.
Similarly, ordinary families in the Basque Country founded immersion schools called ikastolak. They did this in secret during the Franco dictatorship. They sustained a language with no known relatives anywhere on Earth. This was achieved through sheer civic determination.
These are not isolated cases. Across the world, communities are asserting their rights. Their language is not a relic to be mourned. It is a living heritage to be actively inhabited. A language preserved only in archives is a language embalmed. These communities insist upon something far more ambitious. They want transmission, use, and vitality. This is a heritage not stored, but lived.
Heritage, Rights, and the Law
The notion of linguistic heritage has gradually entered international law. Key documents reflect a growing consensus. These include the 2003 UNESCO Convention and the 2007 UN Declaration on the Rights of Indigenous Peoples. The 1992 European Charter for Regional or Minority Languages is also central. Linguistic heritage is not merely a private cultural preference. It is a matter of collective rights. States have an obligation to protect these rights, not merely to tolerate them. Yet enforcement mechanisms remain weak. Funding is chronically insufficient. Legal instruments alone do not address the economic pressures driving language shift.
France illustrates the tensions in this field. The 2021 Molac Law amended the Heritage Code. It recognized a « linguistic heritage » comprising French and regional languages. This committed the state to support their teaching and promotion. However, Article 2 of the Constitution states that « the language of the Republic is French. » Constitutional Court rulings have blocked full ratification of the European Charter. They have also restricted immersive education. This limits the recognition of autonomous linguistic rights.
In the digital context, debates focus on cultural and data sovereignty. Open access to datasets is encouraged for language-sensitive technologies. However, this raises unresolved questions about the control of archives. Threatened languages are protected as heritage, but they are not clearly governed as community-owned data.
Heritage in the Digital Age: Opportunity and Risk
Technology democratizes access to preservation tools. This happened in ways previously unimaginable. A small community can now use open-source speech recognition software. They can train a language model on elders’ recorded speech. They can create learning applications for younger generations. All of this is possible without institutional intermediaries.
But the digital environment also introduces serious risks. Language data constitutes cultural property. This includes recordings, transcriptions, and oral literature. This data may be ingested by large AI systems without consent. This raises profound questions of data sovereignty.
Communities may find their oral knowledge absorbed into commercial systems. They neither control nor benefit from these systems. A heritage built over centuries can be extracted in seconds.
The framework of Indigenous Data Sovereignty is gaining ground. This is the right of communities to govern their own data. Legal scholars and activists support this principle. The deeper idea is clear. If linguistic heritage belongs to a community, then its digital traces belong to them too. No algorithm should make a patrimony common without explicit consent.
Links :
- https://news.un.org/en/story/2009/02/291652
- https://www.pcworld.com/article/465492/google_project_aims_to_document_endangered_languages.html
- https://kawaiola.news/hoonaauao/aha-punana-leo-the-preschool-program-that-inspired-a-movement/
- https://ich.unesco.org/doc/src/2003_Convention_Basic_Texts-_2022_version-EN.pdf
- https://en.wikisource.org/wiki/European_Charter_for_Regional_or_Minority_Languages
- https://www.nytimes.com/1980/01/06/archives/basque-ikastolas-saving-a-heritage.html
