Machines That Learn Language More Like Kids Do
November 1, 2018 | MITEstimated reading time: 6 minutes

Children learn language by observing their environment, listening to the people around them, and connecting the dots between what they see and hear. Among other things, this helps children establish their language’s word order, such as where subjects and verbs fall in a sentence.
Image Caption: MIT researchers have developed a “semantic parser” that learns through observation to more closely mimic a child’s language-acquisition process, which could greatly extend computing’s capabilities.
In computing, learning language is the task of syntactic and semantic parsers. These systems are trained on sentences annotated by humans that describe the structure and meaning behind words. Parsers are becoming increasingly important for web searches, natural-language database querying, and voice-recognition systems such as Alexa and Siri. Soon, they may also be used for home robotics.
But gathering the annotation data can be time-consuming and difficult for less common languages. Additionally, humans don’t always agree on the annotations, and the annotations themselves may not accurately reflect how people naturally speak.
In a paper being presented at this week’s Empirical Methods in Natural Language Processing conference, MIT researchers describe a parser that learns through observation to more closely mimic a child’s language-acquisition process, which could greatly extend the parser’s capabilities. To learn the structure of language, the parser observes captioned videos, with no other information, and associates the words with recorded objects and actions. Given a new sentence, the parser can then use what it’s learned about the structure of the language to accurately predict a sentence’s meaning, without the video.
This “weakly supervised” approach — meaning it requires limited training data — mimics how children can observe the world around them and learn language, without anyone providing direct context. The approach could expand the types of data and reduce the effort needed for training parsers, according to the researchers. A few directly annotated sentences, for instance, could be combined with many captioned videos, which are easier to come by, to improve performance.
In the future, the parser could be used to improve natural interaction between humans and personal robots. A robot equipped with the parser, for instance, could constantly observe its environment to reinforce its understanding of spoken commands, including when the spoken sentences aren’t fully grammatical or clear. “People talk to each other in partial sentences, run-on thoughts, and jumbled language. You want a robot in your home that will adapt to their particular way of speaking … and still figure out what they mean,” says co-author Andrei Barbu, a researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Center for Brains, Minds, and Machines (CBMM) within MIT’s McGovern Institute.
The parser could also help researchers better understand how young children learn language. “A child has access to redundant, complementary information from different modalities, including hearing parents and siblings talk about the world, as well as tactile information and visual information, [which help him or her] to understand the world,” says co-author Boris Katz, a principal research scientist and head of the InfoLab Group at CSAIL. “It’s an amazing puzzle, to process all this simultaneous sensory input. This work is part of bigger piece to understand how this kind of learning happens in the world.”
Co-authors on the paper are: first author Candace Ross, a graduate student in the Department of Electrical Engineering and Computer Science and CSAIL, and a researcher in CBMM; Yevgeni Berzak PhD ’17, a postdoc in the Computational Psycholinguistics Group in the Department of Brain and Cognitive Sciences; and CSAIL graduate student Battushig Myanganbayar.
Visual Learner
For their work, the researchers combined a semantic parser with a computer-vision component trained in object, human, and activity recognition in video. Semantic parsers are generally trained on sentences annotated with code that ascribes meaning to each word and the relationships between the words. Some have been trained on still images or computer simulations.
The new parser is the first to be trained using video, Ross says. In part, videos are more useful in reducing ambiguity. If the parser is unsure about, say, an action or object in a sentence, it can reference the video to clear things up. “There are temporal components — objects interacting with each other and with people — and high-level properties you wouldn’t see in a still image or just in language,” Ross says.
The researchers compiled a dataset of about 400 videos depicting people carrying out a number of actions, including picking up an object or putting it down, and walking toward an object. Participants on the crowdsourcing platform Mechanical Turk then provided 1,200 captions for those videos. They set aside 840 video-caption examples for training and tuning, and used 360 for testing. One advantage of using vision-based parsing is “you don’t need nearly as much data — although if you had [the data], you could scale up to huge datasets,” Barbu says.
In training, the researchers gave the parser the objective of determining whether a sentence accurately describes a given video. They fed the parser a video and matching caption. The parser extracts possible meanings of the caption as logical mathematical expressions. The sentence, “The woman is picking up an apple,” for instance, may be expressed as: λxy.woman x, pick_up x y, apple y.
Those expressions and the video are inputted to the computer-vision algorithm, called “Sentence Tracker,” developed by Barbu and other researchers. The algorithm looks at each video frame to track how objects and people transform over time, to determine if actions are playing out as described. In this way, it determines if the meaning is possibly true of the video.
Page 1 of 2
Testimonial
"Advertising in PCB007 Magazine has been a great way to showcase our bare board testers to the right audience. The I-Connect007 team makes the process smooth and professional. We’re proud to be featured in such a trusted publication."
Klaus Koziol - atgSuggested Items
Weaning the U.S. Military Off a Tablet Supply Chain That Leads to China
09/08/2025 | Jim Will, USPAETablet computers are essential to how our military fights, moves and sustains, but these devices are built on a fragile global supply chain with strong ties to China. Building domestic manufacturing to eliminate this vulnerability is feasible if we tap into the information and capabilities that already exist and create strong demand for tablets produced by trusted and assured sources.
Fresh PCB Concepts: Designing for Success at the Rigid-flex Transition Area
08/28/2025 | Team NCAB -- Column: Fresh PCB ConceptsRigid-flex PCBs come in all shapes and sizes. Manufacturers typically use fire-retardant, grade 4 (FR-4) materials in the rigid section and flexible polyimide materials in the flex region. Because of the small size, some rigid-flex PCBs, like those for hearing aid devices, are among the most challenging to manufacture. However, regardless of its size, we should not neglect the transition area between the rigid and flexible material.
Semiconductors Get Magnetic Boost with New Method from UCLA Researchers
07/31/2025 | UCLA NewsroomA new method for combining magnetic elements with semiconductors — which are vital materials for computers and other electronic devices — was unveiled by a research team led by the California NanoSystems Institute at UCLA.
Japan’s OHISAMA Project Aims to Beam Solar Power from Space This Year
07/14/2025 | I-Connect007 Editorial TeamJapan could be on the cusp of making history with its OHISAMA project in its quest to become the first country to transmit solar power from space to Earth, The Volt reported.
The Big Picture: Our Big ‘Why’ in the Age of AI
06/25/2025 | Mehul Davé -- Column: The Big PictureWith advanced technology, Tesla, Google, Microsoft, and OpenAI can quickly transform life as we know it. Several notable artificial intelligence (AI) studies, including the 2024 McKinsey Global Survey on AI, have offered insights into AI’s adoption, impact, and trajectory. The McKinsey study revealed that AI adoption continues to grow, with 50% of respondents reporting using AI in at least one business area.