Content Deep Dive
Announcing Serverless Multi-LoRA: Fine-tune and deploy hundreds of adapters for model customization at scale
Blog post from Together AI
Post Details
Company
Date Published
Author
Together AI
Word Count
1,224
Language
English
Hacker News Points
-
Summary
Serverless LoRA inference with pay-per-token pricing allows users to upload their own LoRA adapters and run inference on them alongside a compatible serverless model, including popular models like Llama 3.1 and Qwen 2.5. The platform enables dynamic adapter switching at scale, running hundreds of models for the same price as a single base model. This allows for cost-efficient model customization, faster iteration and experimentation, optimized performance at scale, and easy fine-tuning of custom LoRA adapters with the Together Fine-tuning API.