Inference LoRA adapter model
Blog post from Deepinfra
DeepInfra provides a platform for deploying LoRA adapter models, which are small models designed to adapt an original base model for specific tasks. These models are hosted on HuggingFace and require a HuggingFace token if private access is needed. Deploying a LoRA adapter model involves selecting a base model, which can be checked for compatibility on the DeepInfra site, filling out a deployment form, and handling rate limits, which apply collectively to all LoRA adapters using the same base model. LoRA adapters are priced 50% higher than base models and run 50-60% slower due to additional computational requirements, although merging them with base models can improve speed. DeepInfra offers scalable AI hosting with managed GPU infrastructure and claims competitive pricing and uptime.