Inference LoRA adapter model

Post Details

Company

Deepinfra

Date Published

Dec. 6, 2024

Author

Askar Aitzhan

Word Count

459

Language

English

Hacker News Points

-

Source URL

deepinfra.com/blog/inference-lora

Summary

DeepInfra provides a platform for deploying LoRA adapter models, which are small models designed to adapt an original base model for specific tasks. These models are hosted on HuggingFace and require a HuggingFace token if private access is needed. Deploying a LoRA adapter model involves selecting a base model, which can be checked for compatibility on the DeepInfra site, filling out a deployment form, and handling rate limits, which apply collectively to all LoRA adapters using the same base model. LoRA adapters are priced 50% higher than base models and run 50-60% slower due to additional computational requirements, although merging them with base models can improve speed. DeepInfra offers scalable AI hosting with managed GPU infrastructure and claims competitive pricing and uptime.