Mitesh Khapra
This is a collection of articles archived for the excellence of their content. |
2025: On Time’s AI 100 list
[https://time.com/collections/time100-ai-2025/7305866/mitesh-khapra/ Tharin Pillay, Sep 7, 2025: Time]
In one way or another, nearly every Indian startup working on voice technology for the country’s many languages relies on the datasets of Mitesh Khapra and his team. Khapra, an associate professor of computer science at the Indian Institute of Technology Madras, recognized early on that “the reason Indian language technology is behind English is because we do not have enough data for Indian languages,” he says.
While Western models may perform well on highly represented languages like Hindi and Bengali, they are weaker on underrepresented languages. To close the gap, Khapra’s research lab AI4Bharat led a project that took researchers to almost 500 of India’s 700 districts, recording thousands of hours of voices from people with diverse educational and socioeconomic backgrounds to capture all 22 of India’s official languages.
Co-founded in 2019, AI4Bharat became an official partner for the Indian government’s Bhashini program, which uses AI to assist citizens in accessing digital services in their own languages; AI4Bharat supplies 80% of the data with its open-source dataset. It welcomes other developers utilizing its data for their models, too. Khapra says that if big tech companies use its data to make their models better at Hindi or Marathi, “it benefits the country at large.”
AI4Bharat’s AI models have been deployed in the Indian Supreme Court to translate official documents, and to create a voice bot that regional farmers can call to report issues with their government subsidy payments. The company’s latest project involves partnering with Sarvam AI—a startup launched by two other AI4Bharat co-founders—to build India’s first foundation model for the Indian government. Even if the model initially underperforms its Western counterparts, Khapra sees its creation as essential for the country’s sovereignty. “Unless we learn that skill, we will always be in a perpetually dependent position,” he says.
Already Khapra sees his work helping to reshape his nation’s academic research. He says that "15 years back, an average PhD student in India working on language technology would end up working on English problems,” but adds that “with these datasets available, I see a shift: now Indian students are working on Indian problems.”
Already Khapra sees his work helping to reshape his nation’s academic research. He says that "15 years back, an average PhD student in India working on language technology would end up working on English problems,” but adds that “with these datasets available, I see a shift: now Indian students are working on Indian problems."