RynnEC: Bringing MLLMs into Embodied World
Blog post from HuggingFace
RynnEC is a novel multimodal large language model (MLLM) developed by the Alibaba DAMO Academy, designed to enhance embodied cognition through video-centric object and spatial understanding. Unlike traditional models trained on internet-scale images, RynnEC focuses on egocentric video data to improve fine-grained visual understanding and spatial awareness crucial for real-world robotic tasks. It operates without explicit 3D inputs, using RGB videos to map user queries into semantic masks, facilitating seamless integration into embodied agents. RynnEC's training was supported by a scalable data pipeline that converts raw videos into various embodied cognition tasks, including object captioning and spatial reasoning, using 20,000 videos from diverse home environments. Through a structured, four-stage training process, RynnEC achieves significant improvements in object and spatial cognition, as evidenced by its superior performance on the RynnEC-Bench benchmark, outperforming other advanced MLLMs like Gemini-2.5 Pro. This advancement positions RynnEC as a powerful tool for enhancing the interactivity and cognitive capabilities of robots in complex real-world scenarios.
| Trend | Post Mentions | Total Month Mentions | Posts | Companies | MoM |
|---|---|---|---|---|---|
| Data Pipeline | 3 | 564 | 156 | 67 | +17% |
| LLM | 3 | 3,922 | 600 | 189 | -6% |
| AI Model Fine-tuning | 2 | 568 | 107 | 59 | -14% |