• Author(s):Xuxin Cheng, Jialong Li, Shiqi Yang, Ge Yang, Xiaolong Wang

“Open-TeleVision: Teleoperation with Immersive Active Visual Feedback” introduces Open-TeleVision, a cutting-edge teleoperation system designed to enhance the collection of on-robot data for robot learning from demonstrations. This system aims to improve the intuitiveness and ease of use of teleoperation, which are crucial for ensuring high-quality, diverse, and scalable data collection.

Open-TeleVision leverages immersive virtual reality (VR) technology to provide operators with an active visual feedback system. This system allows operators to perceive the robot’s surroundings in a stereoscopic manner, creating an immersive experience where the operator’s arm and hand movements are mirrored by the robot. This setup makes it feel as though the operator’s mind is transmitted into the robot’s embodiment, enhancing the sense of presence and control.

The system is validated through experiments involving data collection and training imitation learning policies on four long-horizon, precise tasks: Can Sorting, Can Insertion, Folding, and Unloading. These tasks are performed using two different humanoid robots, demonstrating the versatility and effectiveness of Open-TeleVision in real-world applications.

A significant innovation of Open-TeleVision is its use of a single active stereo RGB camera mounted on the robot’s head, which mimics human head movements to observe a large workspace. During teleoperation, the camera moves in sync with the operator’s head, streaming real-time, ego-centric 3D observations to the VR device. This first-person active sensing provides a more intuitive mechanism for the user to explore the environment and focus on important regions for detailed interactions. For imitation learning, this active camera setup allows the policy to imitate head movements along with manipulative actions, reducing the processing load and enabling smooth, real-time, and precise control.

The paper also highlights the benefits of streaming stereoscopic video from the robot’s perspective to the operator’s eyes, which enhances spatial understanding and task performance. User studies confirm the importance of stereo perception for spatial awareness, although the system currently lacks other forms of feedback, such as haptic feedback, which could further improve performance in tactile-intensive tasks. “Open-TeleVision: Teleoperation with Immersive Active Visual Feedback” presents a significant advancement in teleoperation systems. By integrating VR technology and active visual feedback, the authors provide a powerful tool for precise and long-horizon manipulation tasks. This research has important implications for various applications, including robotics, industrial automation, and remote operation, making teleoperation more intuitive and effective for users. The system is open-sourced, promoting further research and development in this field.