Connect with us

Trending News

GOOGLE EXPANDS WAXAL SPEECH DATASET WITH KIKUYU & DHOLUO

Published

on

Google has announced the addition of three East African languages Kikuyu, Dholuo, and Luganda to its WAXAL speech dataset

This initiative, launched in Nairobi on February 2, 2026, aims to create of voice-enabled artificial intelligence (AI) tools tailored for millions of speakers of African languages who have historically been underserved by mainstream technology.

The expansion seeks to directly address the gap in voice recognition and natural language processing for indigenous African languages. By incorporating these languages into the WAXAL (Widely Accessible Languages) dataset, Google intends to support developers in building more accessible AI applications, such as voice assistants, speech-to-text services, and localised educational and civic platforms.

The newly enhanced dataset is the result of a three-year collaborative effort and includes over 1,250 hours of naturally spoken, transcribed speech, supplemented by more than 20 hours of high-quality studio recordings. This rich linguistic resource is designed to provide the foundational data needed to train accurate and responsive AI models.

The core mission of WAXAL is to empower communities across Africa,” stated Aisha Walcott-Bryant, Head of Google Research Africa. “For populations with limited access to English-dominant technologies, this dataset can be transformative enabling innovations in healthcare, agriculture, and education that communicate directly with people in the languages they use every day.”

Advertisement

She added, “We’re providing students, researchers, and entrepreneurs with the tools to build technology on their own terms. This has the potential to reach and positively impact over 100 million people.

The WAXAL project is a product of collaboration between Google and several African academic and community institutions, including Makerere University in Uganda, the University of Ghana, and Rwanda’s Digital Umuganda. Together, they have compiled speech data for 21 languages spoken across 25 African nations. 

In addition to the newly added languages, the dataset also includes widely spoken tongues such as Swahili, Hausa, Igbo, Yoruba, and Malagasy. To maximise local impact and encourage innovation, the WAXAL dataset has been released under a Creative Commons license.

This open-access approach grants developers across Africa broad rights to use, adapt, and build upon the data to create solutions relevant to their specific linguistic and cultural contexts.

Advertisement
00:00
00:00
error: Content is protected !!