Home / Companies / Together AI / Blog / Post Details
Content Deep Dive

Announcing Serverless Multi-LoRA: Fine-tune and deploy hundreds of adapters for model customization at scale

Blog post from Together AI

Post Details
Company
Date Published
Author
Together AI
Word Count
1,224
Language
English
Hacker News Points
-
Summary

Serverless LoRA inference with pay-per-token pricing allows users to upload their own LoRA adapters and run inference on them alongside a compatible serverless model, including popular models like Llama 3.1 and Qwen 2.5. The platform enables dynamic adapter switching at scale, running hundreds of models for the same price as a single base model. This allows for cost-efficient model customization, faster iteration and experimentation, optimized performance at scale, and easy fine-tuning of custom LoRA adapters with the Together Fine-tuning API.