Scaling Large Language Models to zero with Ollama

Company

Fly.io

Date Published

Dec. 6, 2023

Author

Xe Iaso

Word count

2044

Language

English

Hacker News points

URL

fly.io/blog/scaling-llm-ollama

Summary

Fly.io is a platform that provides powerful servers worldwide for running code close to users, including GPUs for self-hosted AI. Open-source self-hosted AI tools have advanced significantly in recent months, allowing for new methods of expression and improved capabilities like summarization, conversational assistants, and real-time speech recognition on moderate hardware. Fly.io enables machine learning inference tasks on the edge with enterprise-grade GPUs such as Nvidia A100. Users can scale their GPU nodes to zero running Machines, paying only for what they need when needed. The platform also supports Ollama, a wrapper around llama.cpp that allows users to run large language models on their own hardware with GPU acceleration.