Machines That Learn Language More Like Kids Do

November 1, 2018 | MIT

Estimated reading time: 6 minutes

Connecting the Dots

The expression with the most closely matching representations for objects, humans, and actions becomes the most likely meaning of the caption. The expression, initially, may refer to many different objects and actions in the video, but the set of possible meanings serves as a training signal that helps the parser continuously winnow down possibilities. “By assuming that all of the sentences must follow the same rules, that they all come from the same language, and seeing many captioned videos, you can narrow down the meanings further,” Barbu says.

In short, the parser learns through passive observation: To determine if a caption is true of a video, the parser by necessity must identify the highest probability meaning of the caption. “The only way to figure out if the sentence is true of a video [is] to go through this intermediate step of, ‘What does the sentence mean?’ Otherwise, you have no idea how to connect the two,” Barbu explains. “We don’t give the system the meaning for the sentence. We say, ‘There’s a sentence and a video. The sentence has to be true of the video. Figure out some intermediate representation that makes it true of the video.’”

The training produces a syntactic and semantic grammar for the words it’s learned. Given a new sentence, the parser no longer requires videos, but leverages its grammar and lexicon to determine sentence structure and meaning.

Ultimately, this process is learning “as if you’re a kid,” Barbu says. “You see world around you and hear people speaking to learn meaning. One day, I can give you a sentence and ask what it means and, even without a visual, you know the meaning.”

“This research is exactly the right direction for natural language processing,” says Stefanie Tellex, a professor of computer science at Brown University who focuses on helping robots use natural language to communicate with humans. “To interpret grounded language, we need semantic representations, but it is not practicable to make it available at training time. Instead, this work captures representations of compositional structure using context from captioned videos. This is the paper I have been waiting for!”

In future work, the researchers are interested in modeling interactions, not just passive observations. “Children interact with the environment as they’re learning. Our idea is to have a model that would also use perception to learn,” Ross says.

This work was supported, in part, by the CBMM, the National Science Foundation, a Ford Foundation Graduate Research Fellowship, the Toyota Research Institute, and the MIT-IBM Brain-Inspired Multimedia Comprehension project.

Page 2 of 2

Share on:

Testimonial

"Advertising in PCB007 Magazine has been a great way to showcase our bare board testers to the right audience. The I-Connect007 team makes the process smooth and professional. We’re proud to be featured in such a trusted publication."

Klaus Koziol - atg

Suggested Items

Weaning the U.S. Military Off a Tablet Supply Chain That Leads to China

09/08/2025 | Jim Will, USPAE
Tablet computers are essential to how our military fights, moves and sustains, but these devices are built on a fragile global supply chain with strong ties to China. Building domestic manufacturing to eliminate this vulnerability is feasible if we tap into the information and capabilities that already exist and create strong demand for tablets produced by trusted and assured sources.

Fresh PCB Concepts: Designing for Success at the Rigid-flex Transition Area

08/28/2025 | Team NCAB -- Column: Fresh PCB Concepts
Rigid-flex PCBs come in all shapes and sizes. Manufacturers typically use fire-retardant, grade 4 (FR-4) materials in the rigid section and flexible polyimide materials in the flex region. Because of the small size, some rigid-flex PCBs, like those for hearing aid devices, are among the most challenging to manufacture. However, regardless of its size, we should not neglect the transition area between the rigid and flexible material.

Semiconductors Get Magnetic Boost with New Method from UCLA Researchers

07/31/2025 | UCLA Newsroom
A new method for combining magnetic elements with semiconductors — which are vital materials for computers and other electronic devices — was unveiled by a research team led by the California NanoSystems Institute at UCLA.

Japan’s OHISAMA Project Aims to Beam Solar Power from Space This Year

07/14/2025 | I-Connect007 Editorial Team
Japan could be on the cusp of making history with its OHISAMA project in its quest to become the first country to transmit solar power from space to Earth, The Volt reported.

The Big Picture: Our Big ‘Why’ in the Age of AI

06/25/2025 | Mehul Davé -- Column: The Big Picture
With advanced technology, Tesla, Google, Microsoft, and OpenAI can quickly transform life as we know it. Several notable artificial intelligence (AI) studies, including the 2024 McKinsey Global Survey on AI, have offered insights into AI’s adoption, impact, and trajectory. The McKinsey study revealed that AI adoption continues to grow, with 50% of respondents reporting using AI in at least one business area.

News Highlights

More News

Featured Books

Article Highlights

More Articles

Latest Columns

See all of our columnists

Media Kit - Choose Your Primary Marketing Focus: