Home / Companies / Google Cloud / Blog / Post Details
Content Deep Dive

Video-Touch: Multi-User Remote Robot Control in Google Meet call by DNN-based Gesture Recognition

Blog post from Google Cloud

Post Details
Company
Date Published
Author
-
Word Count
1,785
Language
English
Hacker News Points
-
Summary

The Video-Touch project presents an innovative system enabling multi-user remote control of robots through video conferencing applications like Google Meet, utilizing DNN-based gesture recognition to enhance teleoperation. Developed during the COVID-19 pandemic, this system leverages computer vision technologies, specifically MediaPipe, to recognize and transmit user hand gestures in real-time to a robot, circumventing the need for additional devices. The architecture involves capturing video streams through software like OBS, processing gestures with a modified MediaPipe hand tracking module, and employing ZeroMQ for real-time data transmission to a Python-based robot control module. The system integrates high-density tactile sensors for dexterous manipulation, offering users tactile feedback on objects' properties, and supports potential applications in areas requiring remote collaboration or operation in challenging environments. While promising, the current implementation faces challenges related to latency, depth perception, and gesture convenience, with future updates anticipated to enhance capabilities, including potential expansion to other hardware forms like drones or mobile robots.