Deploy and inference any model from HuggingFace
Blog post from Together AI
Developers are experiencing a shift in how they work, thanks to the introduction of agents that simplify complex tasks like containerization and inference server configuration, which previously required specific expertise or extensive self-education. This change is exemplified by the use of Goose, a CLI agent runner, in conjunction with Together's Dedicated Container Inference (DCI) infrastructure, which allowed for the immediate deployment of Netflix's void-model on Hugging Face without the usual setup delays. By leveraging Goose and Together's skills, developers can quickly bridge knowledge gaps and deploy models in a production-grade environment with minimal effort, as demonstrated by a seamless setup process that involved installing a skill, running a simple prompt, and letting the agent handle the rest. Together's DCI offers a private, GPU-backed environment that simplifies running new models, eliminating the need for developers to manage their own infrastructure, and instead allowing them to quickly experiment with and deploy new models as they become available. This flexibility and ease of use enable developers to focus on innovation rather than technical hurdles, significantly reducing the gap between model release and practical application.