Intel Helps Facilitate AI Language Recognition
December 9, 2021 | IntelEstimated reading time: 1 minute
At the annual Conference on Neural Information Processing Systems (NeurIPS), two Intel-supported whitepapers on spoken language datasets are being presented. The first paper, The People’s Speech, targets “automatic speech recognition” tasks; the second is Multilingual Spoken Words Corpus (MSWC), which involves “keyword spotting.” Datasets coming out of each project contribute a sizeable volume of rich audio data, and each is among the largest collection available in its class.
The MSWC paper is co-authored by Keith Achorn, an AI frameworks engineer in Intel’s Software and Advanced Technology Group (SATG). Keith talks about his experiences on the project in a blog on the Intel Community site.
The People’s Speech and MSWC projects started in 2018, under the auspices of ML Commons, to identify and chart the 50 most used languages in the world into a single dataset, and then figure out a way to make the data useful. Group members came from Intel, Harvard, Alibaba, Oracle, Landing AI, University of Michigan, Google, Baidu and others.
In today’s diverse international, multilingual work environment, the ability to accurately transcribe and translate becomes increasingly important. With these datasets, a computer using artificial intelligence can “hear” a spoken word and produce an automatic transcript or translation.
Both projects utilize “diverse speech,” which means they better represent a natural environment, complete with background noise and informal speech patterns with a mixture of recording equipment in different acoustic environments. This stands apart from highly controlled content such as audiobooks, which are more “sanitized.” Training on diverse speech has been correlated with better accuracy in real-world use.
The People’s Speech project includes tens of thousands of hours of supervised conversational audio. It is now among the world’s largest English speech recognition datasets licensed for academic and commercial usage, and is free to download.
MSWC is an audio speech dataset that has more than 300,000 keywords in dozens of languages, and can be accessed by smart devices. The MSWC is dataset spans languages spoken by over 5 billion people, and advances the research and development of voice applications for a wide global audience.
Both datasets will be widely available for users. They are licensed with extremely permissive licensing terms, including commercial use.
Testimonial
"We’re proud to call I-Connect007 a trusted partner. Their innovative approach and industry insight made our podcast collaboration a success by connecting us with the right audience and delivering real results."
Julia McCaffrey - NCAB GroupSuggested Items
Taking Control of PCB Verification One Step at a Time
10/09/2025 | Kirk Fabbri, Siemens EDAToday’s designs are as complex as ever, and engineers face tough decisions every day. Simulation and verification teams are confronted with a three-fold challenge: understanding the underlying theory, mastering the tools, and applying best practices.Engineers need to navigate a vast and ever-changing cast of design and simulation tools, often with overlapping functionality.
Siemens Launches Lighthouse Project of Made for Germany Initiative
09/23/2025 | SiemensIn the presence of political leaders and representatives of the business community, Siemens today laid the cornerstone for the new Siemens Technology Campus in Erlangen, Germany. A lighthouse project of the Made for Germany initiative, the Technology Campus will further strengthen the economic viability of Germany and the region primarily in the area of power electronics.
HyRel Technologies Showcases Summer Intern Success Through Hands-On Innovation
09/16/2025 | HyRel TechnologiesHyRel Technologies, a global provider of quick turn semiconductor modification solutions, proudly highlights the accomplishments of its two recent summer interns, Danny Hoang and Nisarg Jadav.
Automation Meets Sustainability
09/08/2025 | Rick Nichols, GreenSource EngineeringGreenSource Engineering (GSE) is proud to have contributed to the first successful reshoring of a PCB facility on a greenfield site in the United States. While we are honored to have played a key role, full credit for this achievement goes to SEL for its vision, commitment, and professionalism.
Japan’s OHISAMA Project Aims to Beam Solar Power from Space This Year
07/14/2025 | I-Connect007 Editorial TeamJapan could be on the cusp of making history with its OHISAMA project in its quest to become the first country to transmit solar power from space to Earth, The Volt reported.