Stanford Researchers Teach Robots What Humans Want
June 26, 2019 | Stanford UniversityEstimated reading time: 5 minutes
Sometimes demonstrations alone fail to convey the point of a task. For example, one demonstration in this study had people teach the robot arm to move until it pointed at a specific spot on the ground, and to do that while avoiding an obstacle and without moving above a certain height.
After a human ran the robot through its paces for 30 minutes, the robot tried to perform the task autonomously. It simply pointed straight up. It was so focused on learning not to hit the obstacle, it completely missed the actual goal of the task – pointing to the spot – and the preference for staying low.
That’s where the surveys come in, giving the robot a way of asking, for example, whether the user prefers it move its arm low to the ground or up toward the ceiling. For this study, the group used the slower single question method, but they plan to integrate multiple-question surveys in later work.
In tests, the team found that combining demonstrations and surveys was faster than just specifying preferences and, when compared with demonstrations alone, about 80% of people preferred how the robot behaved when trained with the combined system.
“This is a step in better understanding what people want or expect from a robot,” said Sadigh. “Our work is making it easier and more efficient for humans to interact and teach robots, and I am excited about taking this work further, particularly in studying how robots and humans might learn from each other.”
Better, Faster, Smarter
People who used the combined method reported difficulty understanding what the system was getting at with some of its questions, which sometimes asked them to select between two scenarios that seemed the same or seemed irrelevant to the task – a common problem in preference-based learning. The researchers are hoping to address this shortcoming with easier surveys that also work more quickly.
Hand Coding and Reward Hacking
Another way to teach a robot is to write code that acts as instructions. The challenge is explaining exactly what you want a robot to do, especially if the task is complex. A common problem is known as “reward hacking,” where the robot figures out an easier way to reach the specified goals – such as the car spinning in circles in order to achieve the goal of going fast.
Biyik experienced reward hacking when he was programming a robot arm to grasp a cylinder and hold it in the air.
“I told it the hand must be closed, the object has to have height higher than X and the hand should be at the same height,” described Biyik. “The robot rolled the cylinder object to the edge of the table, hit it upward and then made a fist next to it in the air.”
“Looking to the future, it’s not 100% obvious to me what the right way to make reward functions is, but realistically you’re going to have some sort of combination that can address complex situations with human input,” said Palan. “Being able to design reward functions for autonomous systems is a big, important problem that hasn’t received quite the attention in academia as it deserves.”
The team is also interested in a variation on their system, which would allow people to simultaneously create reward functions for different scenarios. For example, a person may want their car to drive more conservatively in slow traffic and more aggressively when traffic is light.
Page 2 of 2Suggested Items
KIC’s Miles Moreau to Present Profiling Basics and Best Practices at SMTA Wisconsin Chapter PCBA Profile Workshop
01/25/2024 | KICKIC, a renowned pioneer in thermal process and temperature measurement solutions for electronics manufacturing, announces that Miles Moreau, General Manager, will be a featured speaker at the SMTA Wisconsin Chapter In-Person PCBA Profile Workshop.
The Drive Toward UHDI and Substrates
09/20/2023 | I-Connect007 Editorial TeamPanasonic’s Darren Hitchcock spoke with the I-Connect007 Editorial Team on the complexities of moving toward ultra HDI manufacturing. As we learn in this conversation, the number of shifting constraints relative to traditional PCB fabrication is quite large and can sometimes conflict with each other.
Standard Of Excellence: The Products of the Future
09/19/2023 | Anaya Vardya -- Column: Standard of ExcellenceIn my last column, I discussed cutting-edge innovations in printed circuit board technology, focusing on innovative trends in ultra HDI, embedded passives and components, green PCBs, and advanced substrate materials. This month, I’m following up with the products these new PCB technologies are destined for. Why do we need all these new technologies?
Experience ViTrox's State-of-the-Art Offerings at SMTA Guadalajara 2023 Presented by Sales Channel Partner—SMTo Engineering
09/18/2023 | ViTroxViTrox, which aims to be the world’s most trusted technology company, is excited to announce that our trusted Sales Channel Partner (SCP) in Mexico, SMTo Engineering, S.A. de C.V., will be participating in SMTA Guadalajara Expo & Tech Forum. They will be exhibiting in Booth #911 from the 25th to the 26th of October 2023, at the Expo Guadalajara in Jalisco, Mexico.
Intel Unveils Industry-Leading Glass Substrates to Meet Demand for More Powerful Compute
09/18/2023 | IntelIntel announced one of the industry’s first glass substrates for next-generation advanced packaging, planned for the latter part of this decade.