Published:  09:15 AM, 28 December 2025

Experts are working on AI tools to sustain African native languages

 
AI linguistic diversity is the effort to make Artificial Intelligence (AI) work for all the world's 7,000+ languages, not just English and a few others, addressing the major gap where most AI is trained on limited data, leading to exclusion, bias and inequity for speakers of low-resource languages, which threatens digital inclusion and cultural heritage. This involves collecting diverse datasets, developing better models for underrepresented tongues, and creating systems that understand cultural nuances, ensuring AI benefits everyone globally, not just dominant linguistic groups, for better access to opportunities and services.

How do you teach somebody to read a language if there’s nothing for them to read? This is the problem facing developers across the African continent who are trying to train AI to understand and respond to prompts in local languages.

To train a language model, you need data. For a language like English, the easily accessible articles, books and manuals on the internet give developers a ready supply. But for most of Africa’s languages — of which there are estimated to be between 1,500 and 3,000 — there are few written resources available. Vukosi Marivate, a professor of computer science at the University of Pretoria, in South Africa, uses the number of available Wikipedia articles to illustrate the amount of available data. For English, there are over 7 million articles. Tigrinya, spoken by around 9 million people in Ethiopia and Eritrea, has 335. For Akan, the most widely spoken native language in Ghana.

Of those thousands of languages, only 42 are currently supported on a language model. Of Africa’s 23 scripts and alphabets, only three — Latin, Arabic and Ge’Ez (used in the Horn of Africa) — are available. This underdevelopment “comes from a financial standpoint,” says Chinasa T. Okolo, the founder of Technecultura, a research institute working to advance global equity in AI. “Even though there are more Swahili speakers than Finnish speakers, Finland is a better market for companies like Apple and Google.”

If more language models are not developed, the impact across the continent could be dire, Okolo warns. “We’re going to continue to see people locked out of opportunity,” she told CNN. As the continent looks to develop its own AI infrastructure and capabilities, those who do not speak one of these 42 languages risk being left behind.

Okolo says AI developers across the continent “have to re-envision the way that we undertake model development in the first place.”
This is what Marivate has done. Marivate led the South African arm of the African Next Voices project, which has made recordings of 18 languages in South Africa, Kenya and Nigeria.



Latest News


More From Editorial

Go to Home Page »

Site Index The Asian Age