Fireworks AI Blog

116 blog posts published by month since the start of 2022. Start from a different year: 2022
2025

Blog URL

Posts year-to-date

116 (0 posts by this month last year.)

Average posts per month since 2022

0.0

Post details (2022 to today)

Title	Author	Date	Word count	HN points
Fireworks DevDay 2025 Wrapped	-	Oct 06, 2025	990	-
Why do all LLMs need structured output modes?	-	Oct 06, 2025	2806	-
New in Fireworks: Image-to-Image and ControlNet support for SSD-1B and SDXL!	-	Oct 06, 2025	859	-
Announcing custom models and on-demand H100s with 50%+ lower costs and latency than vLLM	-	Oct 06, 2025	1121	-
Fireworks Real-World Benchmarks: Find the Best OSS Model for the Job	-	Oct 06, 2025	765	-
Introducing OpenAI gpt-oss (20b & 120b)	-	Oct 06, 2025	872	-
Quality first: how Fireworks.ai is the go-to place for gpt-oss	-	Oct 06, 2025	1094	-
Audio September Release - Streaming Transcription V2 and Streaming Speaker Diarization	-	Oct 06, 2025	789	-
Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning	-	Oct 06, 2025	800	-
Document inlining: Crossing the modality gap with Compound AI	-	Oct 06, 2025	1685	-
20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour in 4 seconds	-	Oct 06, 2025	1346	-
Agentic AI Systems	-	Oct 06, 2025	1946	-
Introducing Supervised Fine Tuning V2	-	Oct 06, 2025	789	-
Understanding Function Calling: The Bridge to Agentic AI	-	Oct 06, 2025	1251	-
Fireworks.ai Achieves SOC 2 Type II and HIPAA Compliance	-	Oct 06, 2025	416	-
Build customizable, real-time voice agents with Fireworks Voice Agent Platform (Beta)	-	Oct 06, 2025	889	-
FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs	-	Oct 06, 2025	1336	-
Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release	-	Oct 06, 2025	844	-
VibeRL: When AI Trains AI	-	Oct 06, 2025	749	-
How Cresta drives millions of real-time, AI-powered contact center interactions with Fireworks	-	Oct 06, 2025	1108	-
Fireworks Summer Audio Updates: Fastest Transcription now with Diarization and Batch API	-	Oct 06, 2025	1362	-
Multi-LoRA: Personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency	-	Oct 06, 2025	1350	-
Multi-Query Attention is All You Need	-	Oct 06, 2025	3781	-
Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference	-	Oct 06, 2025	1777	-
Deep-dive into MuonClip: Fixing Attention Score Explosions in Transformer Training	-	Oct 06, 2025	2759	-
Deep-Dive into LLM Fine-Tuning	-	Oct 06, 2025	1987	-
Enabling Function Calling in DeepSeek v3: Bridging the Gap Between Text and Action	-	Oct 06, 2025	2220	-
Simplifying Code Infilling with Code Llama and Fireworks.ai	-	Oct 06, 2025	443	-
Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud	-	Oct 06, 2025	425	-
Fireworks AI Now Supports Amazon SageMaker	-	Oct 06, 2025	488	-
Vision Model Platform Updates: Enhanced Capabilities and New Features	-	Oct 06, 2025	1174	-
FireAttention V4: Industry-Leading Latency and Cost Efficiency with FP4	-	Oct 06, 2025	1086	-
Fireworks Streaming Transcription: 300ms with Whisper-v3-large-quality	-	Oct 06, 2025	1119	-
Kimi K2: Deep Dive into model performance and use-cases	-	Oct 06, 2025	1051	-
DeepSeek V3 just got vision capabilities!	-	Oct 06, 2025	525	-
Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction	-	Oct 06, 2025	4353	-
Introducing Llama 3.1 inference endpoints in partnership with Meta	-	Oct 06, 2025	874	-
FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than GPT-4 and open weights	-	Oct 06, 2025	1647	-
FireOptimizer: Customizing latency and quality for your production inference workload	-	Oct 06, 2025	1736	-
Beyond Supervised Fine Tuning: How Reinforcement Learning Empowers AI with Minimal Labels	-	Oct 06, 2025	1970	-
Test-Driven Agent Development with Eval Protocol	-	Oct 06, 2025	1569	-
How Fireworks evaluates quantization precisely and interpretably	-	Oct 06, 2025	2301	-
Qwen 3 on Fireworks AI: Controllable Chain-of-Thought and Tool Calling at Frontier Scale	-	Oct 06, 2025	869	-
Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to production through Fireworks inference	-	Oct 06, 2025	1189	-
Traces Are All You Need (to rank LLMs)	-	Oct 06, 2025	2174	-
Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI	-	Oct 06, 2025	1980	-
Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed and 10% of the cost=	-	Oct 06, 2025	1737	-
Fireworks.ai Now Available on LangChain Prompt Playground	-	Oct 06, 2025	821	-
Mistral Small 3 Now Available on Fireworks: Faster, Lighter, and More Efficient	-	Oct 06, 2025	407	-
Fireworks Raises the Quality Bar with Function Calling Model and API Release	-	Oct 06, 2025	2257	-
Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1	-	Oct 06, 2025	3598	-
Launching Fireworks for Startups Program!	-	Oct 06, 2025	495	-
Global Fast Food Group Transforms Drive-Thru with Real-Time Voice Intelligence with Fireworks	-	Oct 06, 2025	1019	-
Introducing FLUX.1 Kontext on Fireworks	-	Oct 06, 2025	408	-
Fireworks Platform Spring 2024 Updates	-	Oct 06, 2025	1609	-
Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost	-	Oct 06, 2025	963	-
Building a High‑Quality Synthetic Data Pipeline for Supervised Fine‑Tuning	-	Oct 06, 2025	996	-
Code Generation with Large Language Models - Fireworks AI Take	-	Oct 06, 2025	1561	-
DeepSeek R1 Just Got Eyes with Fireworks AI Document Inlining	-	Oct 06, 2025	2265	-
Three projects, one platform: A developer's winning streak with Fireworks AI	-	Oct 06, 2025	1600	-
DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical	-	Oct 06, 2025	1761	-
Building an open-source Browser Agent on Fireworks AI	-	Oct 06, 2025	2718	-
FireLLaVA: the first commercially permissive OSS LLaVA model	-	Oct 06, 2025	991	-
Your AI Benchmark is Lying to You. Here's How We Caught It	-	Oct 06, 2025	4163	-
Introducing Vision-Language Model Fine-tuning: Tailor VLMs to Your Domain	-	Oct 06, 2025	938	-
Supervised Fine-Tuning (SFT) with LoRA on Fireworks AI: Tutorial	-	Oct 06, 2025	1034	-
Run bulk async workloads with Fireworks Batch API	-	Oct 06, 2025	450	-
Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans?	-	Oct 06, 2025	905	-
Qwen3 Decoded: Choosing the Right Model For Your Task	-	Oct 06, 2025	2790	-
Build for Scale with Fireworks Virtual Cloud (GA)	-	Oct 06, 2025	1128	-
Building Enterprise-Scale RAG Systems with Fireworks AI and MongoDB Atlas	-	Oct 06, 2025	1745	-
Building AI agents with the Fireworks Experimentation Platform (GA) and Build SDK (Beta)	-	Oct 06, 2025	1301	-
FLUX.1 on Fireworks: Fast, frugal, and flexible	-	Oct 06, 2025	1137	-
LLM Eval Driven Development with Claude Code	-	Oct 06, 2025	1454	-
Unlock Your Tools: Fireworks Adds OpenAI-Response API with MCP Support (Beta)	-	Oct 06, 2025	1152	-
Fireworks.ai: Fast, Affordable, Customizable Gen AI Platform	-	Oct 06, 2025	1614	-
Understanding Embeddings and Reranking at Scale	-	Oct 06, 2025	1612	-
From text to task: Constrained generation for structured extraction in R1	-	Oct 06, 2025	5992	-
LLM Inference Performance Benchmarking (Part 1)	-	Oct 06, 2025	747	-
Using Model-as-a-Judge for Reward in Reinforcement Fine Tuning	-	Oct 06, 2025	824	-
GPUs on-demand: Not serverless, not reserved, but some third thing	-	Oct 06, 2025	1670	-
Announcing Eval Protocol	-	Oct 06, 2025	829	-
How Upwork and Fireworks deliver faster, smarter proposals for freelancers	-	Oct 06, 2025	1026	-
Fireworks f1: A breakthrough in complex reasoning with Compound AI	-	Oct 06, 2025	605	-
How Cursor built Fast Apply using the Speculative Decoding API	-	Oct 06, 2025	1052	-
Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems	-	Oct 06, 2025	1132	-
Speed, Python: Pick Two. How CUDA Graphs Enable Fast Python Code for Deep Learning	-	Oct 06, 2025	3679	-
Production-Ready AI Agents with Optimized Inference with AWS AgentCore	-	Oct 06, 2025	451	-
Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI Hackathon to Conquer the Shores of Hell	-	Oct 06, 2025	3100	-
Sentient & Fireworks Powers Decentralized AI At Viral Scale	-	Oct 06, 2025	1412	-
FireAttention V2: 12x faster to make Long Contexts practical for Online Inference	-	Oct 06, 2025	891	-
Announcing Embeddings and Reranking On Fireworks AI	-	Oct 15, 2025	899	-
Optimizing Llama 4 Maverick on Fireworks AI	-	Oct 06, 2025	1205	-
DeepSeek V3.1 now on Fireworks AI!	-	Oct 06, 2025	653	-
How Enterprises are using Multimodal Models in production with Fireworks	-	Oct 06, 2025	686	-
Fireworks AI Now Supports NVIDIA NIM Deployments for Blazing AI Inference	-	Oct 06, 2025	884	-
3D FireOptimizer: Automating the Multi-Dimensional Tradeoffs in LLM Serving	-	Oct 06, 2025	1447	-
Accelerating Code Completion with Fireworks Fast LLM Inference	-	Oct 06, 2025	639	-
Real-time, performant code assistance: How Sourcegraph scaled with Fireworks AI	-	Oct 06, 2025	1220	-
Reinforcement Fine Tuning (Beta): Train expert open models to surpass closed frontier models	-	Oct 06, 2025	958	-
FireAttention V3: Enabling AMD as a viable alternative for GPU inference	-	Oct 06, 2025	1910	-
DeepSeek R1: All you need to know 🐳	-	Oct 06, 2025	1502	-
Getting Started with Stability’s API Powered by Fireworks	-	Oct 06, 2025	1040	-
How Notion Cuts Latency 4x and Scales Enterprise AI Workflows with Fireworks AI	-	Oct 06, 2025	584	-
LLM on the edge: Model picking with Fireworks Eval Protocol + Ollama	-	Oct 18, 2025	912	-
Fireworks and AMD partner to power the next gen of AI infrastructure on AMD Instinct™ GPUs	-	Oct 20, 2025	415	-
Deployment Shapes: One-Click Deployment Configured For You	-	Oct 24, 2025	875	-
We raised $250M To Help Enterprises Own Their AI	-	Oct 28, 2025	818	-
Accelerate your Vision Pipelines with the new NVIDIA Nemotron Nano 2 VL Model on Fireworks AI	-	Oct 27, 2025	831	-
Genspark’s Deep Research Agent Outperforms a Frontier Closed Model in Quality and Tool Calls using Fireworks RFT, Achieving a 50% Cost Reduction	-	Nov 01, 2025	1126	-
40X Faster, and Smarter Outputs: How Vercel Turbocharged their Code Fixing Model with Open Models, Speculative Decoding and Reinforcement Fine Tuning on Fireworks?	-	Oct 31, 2025	1086	-
Fireworks RFT: Build AI agents with fine-tuned open models that outperform frontier closed models	-	Nov 11, 2025	1046	-
Modernizing Healthcare with AI: How RADPAIR and Fireworks Unlock Smarter Radiology Workflows	-	Nov 09, 2025	2408	-
50 Trillion Tokens Per Day: The State of Agent Environments	-	Nov 19, 2025	2411	-
Fireworks Achieves Triple ISO Certification, giving Enterprises Full Control and Trust in AI at Scale	-	Nov 20, 2025	771	-
Eval Protocol: RL on your agents, in any environment	-	Nov 21, 2025	1400	-

Fireworks AI blog content

116 blog posts published by month since the start of 2022. Start from a different year: 20222025

Post details (2022 to today)

116 blog posts published by month since the start of 2022. Start from a different year: 2022
2025