Training Robots Without Robots Using Only Human Videos

Feb 1, 2025ยท
Marion Lepert
Jiaying Fang
Jiaying Fang
,
Jeannette Bohg
ยท 0 min read
Human Video Editing with Synthetic Robot Arm
Abstract
Scaling robotics data collection is critical to advancing general-purpose robots. Current approaches often rely on teleoperated demonstrations which are difficult to scale. We propose a novel data collection method that eliminates the need for robotics hardware by leveraging human video demonstrations. By training imitation learning policies on this human data, our approach enables zero-shot deployment on robots without collecting any robot-specific data. To bridge the embodiment gap between human and robot appearances, we utilize a data editing approach on the input observations that aligns the image distributions between training data on humans and test data on robots. Our method significantly reduces the cost of diverse data collection by allowing anyone with an RGBD camera to contribute. We demonstrate that our approach works in diverse, unseen environments and on varied tasks.
Publication
Submitted