Gemma 4 VLA Demo on Jetson Orin Nano Super
Blog post from HuggingFace
Gemma 4 is a voice-activated assistant that operates on the NVIDIA Jetson Orin Nano Super, utilizing speech-to-text (STT) and text-to-speech (TTS) models to provide responses. This setup allows Gemma 4 to autonomously decide whether visual input is necessary to answer questions, using a connected webcam to capture and analyze images when needed, without relying on hardcoded logic or keyword triggers. The system can be set up using a single script available on GitHub, which downloads necessary models and configures the environment. The demo showcases the potential of running advanced AI models on a compact hardware platform, illustrating how Gemma 4 interacts through a local server, with options for audio or text-only modes. The tutorial also covers hardware requirements and provides troubleshooting tips to optimize performance on the Jetson Orin Nano.