Fireworks AI Blog

Blog URL

fireworks.ai/blog

Posts YTD

34 ↑ vs 28 last year

Avg Posts/Month

4.1 since 2024

Monthly Post Volume

Start year: 2023 2024 2025 2026

Post Details

Search:

Title	Author	Published	Words	HN Pts
Fireworks DevDay 2025 Wrapped	--	2025-05-29	963	--
Why do all LLMs need structured output modes?	--	2024-02-20	2,766	--
Announcing custom models and on-demand H100s with 50%+ lower costs and latency …	--	2024-06-03	1,072	--
Fireworks Real-World Benchmarks: Find the Best OSS Model for the Job	--	2025-07-30	681	--
Introducing OpenAI gpt-oss (20b & 120b)	--	2025-08-05	804	--
Quality first: how Fireworks.ai is the go-to place for gpt-oss	--	2025-08-12	1,030	--
Audio September Release - Streaming Transcription V2 and Streaming Speaker Diarization	--	2025-10-06	789	--
Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning	--	2024-04-18	729	--
Document inlining: Crossing the modality gap with Compound AI	--	2025-10-06	1,685	--
20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour in …	--	2024-12-09	1,307	--
Agentic AI Systems	--	2025-05-19	1,900	--
Introducing Supervised Fine Tuning V2	--	2025-06-13	735	--
Understanding Function Calling: The Bridge to Agentic AI	--	2025-07-11	1,203	--
Build customizable, real-time voice agents with Fireworks Voice Agent Platform (Beta)	--	2025-10-06	889	--
FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing …	--	2024-01-08	1,278	--
VibeRL: When AI Trains AI	--	2025-07-22	697	--
How Cresta drives millions of real-time, AI-powered contact center interactions with Fireworks	--	2024-12-08	1,085	--
Fireworks Summer Audio Updates: Fastest Transcription now with Diarization and Batch API	--	2025-10-06	1,362	--
Multi-LoRA: Personalize AI at scale and deliver the best experience for each …	--	2024-09-18	1,201	--
Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference	--	2024-09-25	1,689	--
Deep-dive into MuonClip: Fixing Attention Score Explosions in Transformer Training	--	2025-07-15	2,699	--
Deep-Dive into LLM Fine-Tuning	--	2025-10-06	1,976	--
Enabling Function Calling in DeepSeek v3: Bridging the Gap Between Text and …	--	2025-02-14	2,159	--
Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud	--	2025-03-18	386	--
Fireworks AI Now Supports Amazon SageMaker	--	2025-07-15	448	--
Vision Model Platform Updates: Enhanced Capabilities and New Features	--	2025-06-12	1,133	--
FireAttention V4: Industry-Leading Latency and Cost Efficiency with FP4	--	2025-05-28	1,011	--
Fireworks Streaming Transcription: 300ms with Whisper-v3-large-quality	--	2025-10-06	1,119	--
Kimi K2: Deep Dive into model performance and use-cases	--	2025-08-01	989	--
DeepSeek V3 just got vision capabilities!	--	2024-12-18	471	--
Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction	--	2024-08-29	4,283	--
Introducing Llama 3.1 inference endpoints in partnership with Meta	--	2024-07-23	805	--
FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than …	--	2024-02-20	1,598	--
FireOptimizer: Customizing latency and quality for your production inference workload	--	2024-08-30	1,685	--
Beyond Supervised Fine Tuning: How Reinforcement Learning Empowers AI with Minimal Labels	--	2025-01-27	1,905	--
Test-Driven Agent Development with Eval Protocol	--	2025-08-14	1,501	--
How Fireworks evaluates quantization precisely and interpretably	--	2024-08-01	2,277	--
Qwen 3 on Fireworks AI: Controllable Chain-of-Thought and Tool Calling at Frontier …	--	2025-05-06	815	--
Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to …	--	2024-03-08	1,138	--
Traces Are All You Need (to rank LLMs)	--	2025-09-22	2,091	--
Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI	--	2024-03-21	1,904	--
Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed …	--	2024-06-17	1,684	--
Mistral Small 3 Now Available on Fireworks: Faster, Lighter, and More Efficient	--	2025-01-30	347	--
Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1	--	2024-08-14	3,514	--
Launching Fireworks for Startups Program!	--	2025-10-01	473	--
Global Fast Food Group Transforms Drive-Thru with Real-Time Voice Intelligence with Fireworks	--	2025-10-06	1,019	--
Introducing FLUX.1 Kontext on Fireworks	--	2025-07-09	372	--
Fireworks Platform Spring 2024 Updates	--	2024-03-01	1,572	--
Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost	--	2025-03-12	890	--
Building a High‑Quality Synthetic Data Pipeline for Supervised Fine‑Tuning	--	2025-06-04	972	--
Code Generation with Large Language Models - Fireworks AI Take	--	2024-05-08	1,466	--
DeepSeek R1 Just Got Eyes with Fireworks AI Document Inlining	--	2025-02-05	2,194	--
Three projects, one platform: A developer's winning streak with Fireworks AI	--	2024-10-14	1,561	--
DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical	--	2025-02-07	1,663	--
Building an open-source Browser Agent on Fireworks AI	--	2025-05-21	2,613	--
FireLLaVA: the first commercially permissive OSS LLaVA model	--	2024-01-18	933	--
Your AI Benchmark is Lying to You. Here's How We Caught It	--	2025-08-15	4,108	--
Introducing Vision-Language Model Fine-tuning: Tailor VLMs to Your Domain	--	2025-07-29	885	--
Supervised Fine-Tuning (SFT) with LoRA on Fireworks AI: Tutorial	--	2025-05-12	1,037	--
Run bulk async workloads with Fireworks Batch API	--	2025-07-31	419	--
Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans?	--	2025-01-31	853	--
Qwen3 Decoded: Choosing the Right Model For Your Task	--	2025-08-01	2,790	--
Build for Scale with Fireworks Virtual Cloud (GA)	--	2025-06-16	1,088	--
Building Enterprise-Scale RAG Systems with Fireworks AI and MongoDB Atlas	--	2025-04-09	1,702	--
Building AI agents with the Fireworks Experimentation Platform (GA) and Build SDK …	--	2025-06-11	1,241	--
FLUX.1 on Fireworks: Fast, frugal, and flexible	--	2024-10-22	1,107	--
LLM Eval Driven Development with Claude Code	--	2025-08-25	1,394	--
Unlock Your Tools: Fireworks Adds OpenAI-Response API with MCP Support (Beta)	--	2025-06-22	1,088	--
Understanding Embeddings and Reranking at Scale	--	2025-09-12	1,546	--
From text to task: Constrained generation for structured extraction in R1	--	2025-02-01	5,968	--
Using Model-as-a-Judge for Reward in Reinforcement Fine Tuning	--	2025-07-10	765	--
GPUs on-demand: Not serverless, not reserved, but some third thing	--	2024-06-03	1,648	--
Announcing Eval Protocol	--	2025-08-04	783	--
How Upwork and Fireworks deliver faster, smarter proposals for freelancers	--	2024-11-11	990	--
Fireworks f1: A breakthrough in complex reasoning with Compound AI	--	2024-11-15	535	--
How Cursor built Fast Apply using the Speculative Decoding API	--	2024-06-23	997	--
Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound …	--	2024-07-11	1,070	--
Production-Ready AI Agents with Optimized Inference with AWS AgentCore	--	2025-10-02	401	--
Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI …	--	2024-05-06	3,024	--
Sentient & Fireworks Powers Decentralized AI At Viral Scale	--	2025-07-17	1,333	--
FireAttention V2: 12x faster to make Long Contexts practical for Online Inference	--	2024-06-20	848	--
Announcing Embeddings and Reranking On Fireworks AI	--	2025-10-09	870	--
Optimizing Llama 4 Maverick on Fireworks AI	--	2025-04-28	1,151	--
DeepSeek V3.1 now on Fireworks AI!	--	2025-08-26	613	--
How Enterprises are using Multimodal Models in production with Fireworks	--	2024-09-25	596	--
Fireworks AI Now Supports NVIDIA NIM Deployments for Blazing AI Inference	--	2025-03-18	851	--
3D FireOptimizer: Automating the Multi-Dimensional Tradeoffs in LLM Serving	--	2025-06-14	1,385	--
Real-time, performant code assistance: How Sourcegraph scaled with Fireworks AI	--	2025-01-22	1,214	--
Reinforcement Fine Tuning (Beta): Train expert open models to surpass closed frontier …	--	2025-06-09	885	--
FireAttention V3: Enabling AMD as a viable alternative for GPU inference	--	2024-10-15	1,856	--
DeepSeek R1: All you need to know 🐳	--	2025-01-24	1,431	--
Getting Started with Stability’s API Powered by Fireworks	--	2024-04-17	987	--
How Notion Cuts Latency 4x and Scales Enterprise AI Workflows with Fireworks …	--	2025-07-25	514	--
LLM on the edge: Model picking with Fireworks Eval Protocol + Ollama	--	2025-10-15	852	--
Fireworks and AMD partner to power the next gen of AI infrastructure …	--	2025-10-20	341	--
Deployment Shapes: One-Click Deployment Configured For You	--	2025-10-23	828	--
We raised $250M To Help Enterprises Own Their AI	--	2025-10-28	788	--
Accelerate your Vision Pipelines with the new NVIDIA Nemotron Nano 2 VL …	--	2025-10-27	793	--
Genspark’s Deep Research Agent Outperforms a Frontier Closed Model in Quality and …	--	2025-10-31	1,063	--
40X Faster, and Smarter Outputs: How Vercel Turbocharged their Code Fixing Model …	--	2025-11-03	1,025	--
Fireworks RFT: Build AI agents with fine-tuned open models that outperform frontier …	--	2025-11-10	1,023	--
Modernizing Healthcare with AI: How RADPAIR and Fireworks Unlock Smarter Radiology Workflows	--	2025-11-09	2,365	--
50 Trillion Tokens Per Day: The State of Agent Environments	--	2025-11-19	2,333	--
Fireworks Achieves Triple ISO Certification, giving Enterprises Full Control and Trust in …	--	2025-11-19	739	--
Eval Protocol: RL on your agents, in any environment	--	2025-11-20	1,316	--
Fireworks Expands AWS Alliance: Strategic Collaboration Agreement + GenAI Competency	--	2025-11-24	593	--
Unlock Advanced Reasoning with NVIDIA Nemotron Nano 2 Models on Fireworks AI	--	2025-12-02	1,294	--
Turn Your LLM into a Calibrated Classifier for $2	--	2025-12-04	2,523	--
Best Practices for Multi-Turn RL	--	2025-12-10	2,796	--
NVIDIA Nemotron 3 Nano on Fireworks: The Engine for Next-Generation AI Agents	--	2025-12-15	787	--
Self-Improving Agents, Powered by Your Evals	--	2025-12-17	1,339	--
DPO, your simplest RL pipeline with two rollouts	--	2025-12-31	3,074	--
A Deep Dive into MLA training/inference difference and why QK-Clip from Kimi …	--	2025-07-22	5,418	--
Turning Production Logs into Evaluation Datasets: A Data-Driven Approach	--	2026-01-23	1,274	--
Kimi K2.5 is Live on Fireworks: Vibe Coding, Agents, and Full-Parameter RFT	--	2026-01-26	790	--
Build powerful agents on OSS models with Blazing Fast Inference on Fireworks	--	2026-01-27	379	--
The Missing Piece of the OpenClaw Mania: Truly ‘Own Your AI’ with …	--	2026-01-30	964	--
The Benchmark Gap: What It Takes to Ship Kimi K2.5	--	2026-02-03	2,040	--
Training-Inference Parity in MoE Models: Where Numerics Drift	--	2026-03-10	2,902	--
Fireworks Acquires Hathora to Accelerate Global Compute Orchestration	--	2026-03-10	458	--
Introducing Fireworks on Microsoft Foundry: Bringing Best-in-Class Open Model inference to Azure	--	2026-03-08	741	--
Best Open Source LLMs in 2026: We Reviewed 7 Models	--	2026-01-13	5,177	--
Why Building Mega Clusters Is Wrong	--	2026-03-10	2,382	--
Frontier RL Is Cheaper Than You Think	--	2026-03-23	2,138	--
The Fine-Tuning Bottleneck Isn't the Algorithm	--	2026-03-28	1,800	--
Scaling and Optimizing Frontier Model Training	--	2026-04-03	2,555	--
[staged] Introducing The Inference Fabric: Own Your AI	--	2026-04-06	2,309	--
Own Your AI: Fireworks Training Preview	--	2026-04-06	1,291	--
The DeepSeek Model Lineup: V3.2, R1, and Distilled Variants Mapped to Production …	--	2026-02-27	2,543	--
How We Protect from Prompt Injection on Fireworks AI	--	2026-04-03	2,138	--
How we fixed prompt injection for all models on Fireworks	--	2026-04-24	2,088	--
Notes on DeepSeek-V4's training system	--	2026-04-24	2,332	--
DeepSeek V4 Pro: Validating Frontier Models For Production	--	2026-04-27	1,272	--
Best LLMs for coding in 2026	--	2026-03-02	9,636	--
Innovative Solutions Rebuilds Enterprise Services Delivery with Fireworks AI	--	2026-05-05	1,403	--
Agents Don't Fail on Intelligence. They Fail on Execution.	--	2026-05-20	5,118	--
The Best 8 LLM API Providers in 2026	--	2026-03-04	10,131	--
Serverless 2.0: Three Ways to Run Inference, One API	--	2026-05-26	1,728	--
Trilogy Validates Open-Weight AI Models for Enterprise AI Workloads with Fireworks	--	2026-06-01	1,206	--
Open-source agents with frontier advisors: matching frontier performance through training and harness …	--	2026-06-03	2,368	--
NVIDIA Nemotron 3 Ultra is live on Fireworks, day zero	--	2026-06-04	531	--
Inference Providers vs. API Routers: where do tokens come from?	--	2026-03-06	1,575	--
MiniMax M3 is live: long context + native multimodality at 1/20th the …	--	2026-06-12	1,160	--
Qwen 3.7 Plus on Fireworks: Run it today.	--	2026-06-12	1,189	--
Kimi K2.7 Code on Fireworks: Better Agents, Lower Cost per Task, Available …	--	2026-06-12	821	--
GLM 5.2 is live on Fireworks inference, day zero.	--	2026-06-16	1,112	--
Fireworks is moving to prepaid billing on July 1st	--	2026-06-18	608	--

Plushcap, by Matt Makai. 2021-2026.