Distributing AI Models into Self-Hosted Environments - Lessons from Replicated and H2O.ai
Blog post from Replicated
Deploying AI models into self-hosted or air-gapped environments presents distinct challenges, particularly when dealing with large language models (LLMs). H2O.ai has developed an architecture that separates model artifacts from serving infrastructure to facilitate deployment without compromising performance. Key components include a self-contained Kubernetes environment, a service for managing application components, and an object storage service for large model artifacts. The architecture supports various deployment options, allowing customers to integrate their own storage solutions or use cloud-based services. H2O.ai emphasizes best practices such as differentiating model sizes for packaging, designing for offline environments, using Helm or Replicated for installation, and storing large models externally to avoid overloading container registries. They also focus on fine-tuning and model ownership, optimizing model-serving parameters, and ensuring dev/prod parity through CI/CD pipelines. This approach offers a flexible and reliable way to distribute AI models in demanding enterprise environments.