Stanford Researchers Teach Robots What Humans Want
June 26, 2019 | Stanford UniversityEstimated reading time: 5 minutes

Told to optimize for speed while racing down a track in a computer game, a car pushes the pedal to the metal and proceeds to spin in a tight little circle. Nothing in the instructions told the car to drive straight, and so it improvised.
Image Caption: An example of how the robot arm uses survey questions to determine the preferences of the person using it. In this case, the person prefers trajectory #1 (T1) over trajectory #2. (Image credit: Andy Palan and Gleb Shevchuk)
This example—funny in a computer game but not so much in life—is among those that motivated Stanford University researchers to build a better way to set goals for autonomous systems.
Dorsa Sadigh, assistant professor of computer science and of electrical engineering, and her lab have combined two different ways of setting goals for robots into a single process, which performed better than either of its parts alone in both simulations and real-world experiments. The researchers presented the work June 24 at the Robotics: Science and Systems conference.
Researchers are trying to make it easier for humans to tell autonomous systems, such as vehicles and robots, what they want them to do. (Image credit: Getty Images)
“In the future, I fully expect there to be more autonomous systems in the world and they are going to need some concept of what is good and what is bad,” said Andy Palan, graduate student in computer science and co-lead author of the paper. “It’s crucial, if we want to deploy these autonomous systems in the future, that we get that right.”
The team’s new system for providing instruction to robots—known as reward functions—combines demonstrations, in which humans show the robot what to do, and user preference surveys, in which people answer questions about how they want the robot to behave.
“Demonstrations are informative but they can be noisy. On the other hand, preferences provide, at most, one bit of information, but are way more accurate,” said Sadigh. “Our goal is to get the best of both worlds, and combine data coming from both of these sources more intelligently to better learn about humans’ preferred reward function.”
Demonstrations and Surveys
In previous work, Sadigh had focused on preference surveys alone. These ask people to compare scenarios, such as two trajectories for an autonomous car. This method is efficient, but could take as much as three minutes to generate the next question, which is still slow for creating instructions for complex systems like a car.
To speed that up, the group later developed a way of producing multiple questions at once, which could be answered in quick succession by one person or distributed among several people. This update sped the process 15 to 50 times compared to producing questions one-by-one.
The new combination system begins with a person demonstrating a behavior to the robot. That can give autonomous robots a lot of information, but the robot often struggles to determine what parts of the demonstration are important. People also don’t always want a robot to behave just like the human that trained it.
“We can’t always give demonstrations, and even when we can, we often can’t rely on the information people give,” said Erdem Biyik, a graduate student in electrical engineering who led the work developing the multiple-question surveys. “For example, previous studies have shown people want autonomous cars to drive less aggressively than they do themselves.”
Page 1 of 2
Suggested Items
Intervala Hosts Employee Car and Motorcycle Show, Benefit Nonprofits
08/27/2024 | IntervalaIntervala hosted an employee car and motorcycle show, aptly named the Vala-Cruise and it was a roaring success! Employees had the chance to show off their prized wheels, and it was incredible to see the variety and passion on display.
KIC Honored with IPC Recognition for 25 Years of Membership and Contributions to Electronics Manufacturing Industry
06/24/2024 | KICKIC, a renowned pioneer in thermal process and temperature measurement solutions for electronics manufacturing, is proud to announce that it has been recognized by IPC for 25 years of membership and significant contributions to electronics manufacturing.
Boeing Starliner Spacecraft Completes Successful Crewed Docking with International Space Station
06/07/2024 | BoeingNASA astronauts Barry "Butch" Wilmore and Sunita "Suni" Williams successfully docked Boeing's Starliner spacecraft to the International Space Station (ISS), about 26 hours after launching from Cape Canaveral Space Force Station.
KIC’s Miles Moreau to Present Profiling Basics and Best Practices at SMTA Wisconsin Chapter PCBA Profile Workshop
01/25/2024 | KICKIC, a renowned pioneer in thermal process and temperature measurement solutions for electronics manufacturing, announces that Miles Moreau, General Manager, will be a featured speaker at the SMTA Wisconsin Chapter In-Person PCBA Profile Workshop.
The Drive Toward UHDI and Substrates
09/20/2023 | I-Connect007 Editorial TeamPanasonic’s Darren Hitchcock spoke with the I-Connect007 Editorial Team on the complexities of moving toward ultra HDI manufacturing. As we learn in this conversation, the number of shifting constraints relative to traditional PCB fabrication is quite large and can sometimes conflict with each other.