| Fireworks DevDay 2025 Wrapped |
- |
Oct 06, 2025 |
990 |
- |
| Why do all LLMs need structured output modes? |
- |
Oct 06, 2025 |
2806 |
- |
| New in Fireworks: Image-to-Image and ControlNet support for SSD-1B and SDXL! |
- |
Oct 06, 2025 |
859 |
- |
| Announcing custom models and on-demand H100s with 50%+ lower costs and latency than vLLM |
- |
Oct 06, 2025 |
1121 |
- |
| Fireworks Real-World Benchmarks: Find the Best OSS Model for the Job |
- |
Oct 06, 2025 |
765 |
- |
| Introducing OpenAI gpt-oss (20b & 120b) |
- |
Oct 06, 2025 |
872 |
- |
| Quality first: how Fireworks.ai is the go-to place for gpt-oss |
- |
Oct 06, 2025 |
1094 |
- |
| Audio September Release - Streaming Transcription V2 and Streaming Speaker Diarization |
- |
Oct 06, 2025 |
789 |
- |
| Partnering with Meta to bring Llama 3 to Firework’s inference and fine-tuning |
- |
Oct 06, 2025 |
800 |
- |
| Document inlining: Crossing the modality gap with Compound AI |
- |
Oct 06, 2025 |
1685 |
- |
| 20x faster Whisper than OpenAI - Fireworks audio transcribes 1 hour in 4 seconds |
- |
Oct 06, 2025 |
1346 |
- |
| Agentic AI Systems |
- |
Oct 06, 2025 |
1946 |
- |
| Introducing Supervised Fine Tuning V2 |
- |
Oct 06, 2025 |
789 |
- |
| Understanding Function Calling: The Bridge to Agentic AI |
- |
Oct 06, 2025 |
1251 |
- |
| Fireworks.ai Achieves SOC 2 Type II and HIPAA Compliance |
- |
Oct 06, 2025 |
416 |
- |
| Build customizable, real-time voice agents with Fireworks Voice Agent Platform (Beta) |
- |
Oct 06, 2025 |
889 |
- |
| FireAttention — Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs |
- |
Oct 06, 2025 |
1336 |
- |
| Mixtral 8x7B on Fireworks: faster, cheaper, even before the official release |
- |
Oct 06, 2025 |
844 |
- |
| VibeRL: When AI Trains AI |
- |
Oct 06, 2025 |
749 |
- |
| How Cresta drives millions of real-time, AI-powered contact center interactions with Fireworks |
- |
Oct 06, 2025 |
1108 |
- |
| Fireworks Summer Audio Updates: Fastest Transcription now with Diarization and Batch API |
- |
Oct 06, 2025 |
1362 |
- |
| Multi-LoRA: Personalize AI at scale and deliver the best experience for each customer and use case, with 100x cost-efficiency |
- |
Oct 06, 2025 |
1350 |
- |
| Multi-Query Attention is All You Need |
- |
Oct 06, 2025 |
3781 |
- |
| Partnering with Meta: Bringing Llama 3.2 to Fireworks for Fine-Tuning and Inference |
- |
Oct 06, 2025 |
1777 |
- |
| Deep-dive into MuonClip: Fixing Attention Score Explosions in Transformer Training |
- |
Oct 06, 2025 |
2759 |
- |
| Deep-Dive into LLM Fine-Tuning |
- |
Oct 06, 2025 |
1987 |
- |
| Enabling Function Calling in DeepSeek v3: Bridging the Gap Between Text and Action |
- |
Oct 06, 2025 |
2220 |
- |
| Simplifying Code Infilling with Code Llama and Fireworks.ai |
- |
Oct 06, 2025 |
443 |
- |
| Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud |
- |
Oct 06, 2025 |
425 |
- |
| Fireworks AI Now Supports Amazon SageMaker |
- |
Oct 06, 2025 |
488 |
- |
| Vision Model Platform Updates: Enhanced Capabilities and New Features |
- |
Oct 06, 2025 |
1174 |
- |
| FireAttention V4: Industry-Leading Latency and Cost Efficiency with FP4 |
- |
Oct 06, 2025 |
1086 |
- |
| Fireworks Streaming Transcription: 300ms with Whisper-v3-large-quality |
- |
Oct 06, 2025 |
1119 |
- |
| Kimi K2: Deep Dive into model performance and use-cases |
- |
Oct 06, 2025 |
1051 |
- |
| DeepSeek V3 just got vision capabilities! |
- |
Oct 06, 2025 |
525 |
- |
| Build Your Own Flight Recommendation System using FastAPI, SerpAPI, and Firefunction |
- |
Oct 06, 2025 |
4353 |
- |
| Introducing Llama 3.1 inference endpoints in partnership with Meta |
- |
Oct 06, 2025 |
874 |
- |
| FireFunction V1 - Fireworks’ GPT-4-level function calling model - 4x faster than GPT-4 and open weights |
- |
Oct 06, 2025 |
1647 |
- |
| FireOptimizer: Customizing latency and quality for your production inference workload |
- |
Oct 06, 2025 |
1736 |
- |
| Beyond Supervised Fine Tuning: How Reinforcement Learning Empowers AI with Minimal Labels |
- |
Oct 06, 2025 |
1970 |
- |
| Test-Driven Agent Development with Eval Protocol |
- |
Oct 06, 2025 |
1569 |
- |
| How Fireworks evaluates quantization precisely and interpretably |
- |
Oct 06, 2025 |
2301 |
- |
| Qwen 3 on Fireworks AI: Controllable Chain-of-Thought and Tool Calling at Frontier Scale |
- |
Oct 06, 2025 |
869 |
- |
| Fireworks launches fine-tuning service - Rapidly iterate on quality and scale to production through Fireworks inference |
- |
Oct 06, 2025 |
1189 |
- |
| Traces Are All You Need (to rank LLMs) |
- |
Oct 06, 2025 |
2174 |
- |
| Optimizing Retrieval Augmented Generation (RAG) with MongoDB Atlas and Fireworks AI |
- |
Oct 06, 2025 |
1980 |
- |
| Firefunction-v2: Function calling capability on par with GPT4o at 2.5x the speed and 10% of the cost= |
- |
Oct 06, 2025 |
1737 |
- |
| Fireworks.ai Now Available on LangChain Prompt Playground |
- |
Oct 06, 2025 |
821 |
- |
| Mistral Small 3 Now Available on Fireworks: Faster, Lighter, and More Efficient |
- |
Oct 06, 2025 |
407 |
- |
| Fireworks Raises the Quality Bar with Function Calling Model and API Release |
- |
Oct 06, 2025 |
2257 |
- |
| Building a RAG with Astro, FastAPI, SurrealDB and Llama 3.1 |
- |
Oct 06, 2025 |
3598 |
- |
| Launching Fireworks for Startups Program! |
- |
Oct 06, 2025 |
495 |
- |
| Global Fast Food Group Transforms Drive-Thru with Real-Time Voice Intelligence with Fireworks |
- |
Oct 06, 2025 |
1019 |
- |
| Introducing FLUX.1 Kontext on Fireworks |
- |
Oct 06, 2025 |
408 |
- |
| Fireworks Platform Spring 2024 Updates |
- |
Oct 06, 2025 |
1609 |
- |
| Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost |
- |
Oct 06, 2025 |
963 |
- |
| Building a High‑Quality Synthetic Data Pipeline for Supervised Fine‑Tuning |
- |
Oct 06, 2025 |
996 |
- |
| Code Generation with Large Language Models - Fireworks AI Take |
- |
Oct 06, 2025 |
1561 |
- |
| DeepSeek R1 Just Got Eyes with Fireworks AI Document Inlining |
- |
Oct 06, 2025 |
2265 |
- |
| Three projects, one platform: A developer's winning streak with Fireworks AI |
- |
Oct 06, 2025 |
1600 |
- |
| DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical |
- |
Oct 06, 2025 |
1761 |
- |
| Building an open-source Browser Agent on Fireworks AI |
- |
Oct 06, 2025 |
2718 |
- |
| FireLLaVA: the first commercially permissive OSS LLaVA model |
- |
Oct 06, 2025 |
991 |
- |
| Your AI Benchmark is Lying to You. Here's How We Caught It |
- |
Oct 06, 2025 |
4163 |
- |
| Introducing Vision-Language Model Fine-tuning: Tailor VLMs to Your Domain |
- |
Oct 06, 2025 |
938 |
- |
| Supervised Fine-Tuning (SFT) with LoRA on Fireworks AI: Tutorial |
- |
Oct 06, 2025 |
1034 |
- |
| Run bulk async workloads with Fireworks Batch API |
- |
Oct 06, 2025 |
450 |
- |
| Distillation with Reasoning: Can DeepSeek R1 Teach Better Than Humans? |
- |
Oct 06, 2025 |
905 |
- |
| Qwen3 Decoded: Choosing the Right Model For Your Task |
- |
Oct 06, 2025 |
2790 |
- |
| Build for Scale with Fireworks Virtual Cloud (GA) |
- |
Oct 06, 2025 |
1128 |
- |
| Building Enterprise-Scale RAG Systems with Fireworks AI and MongoDB Atlas |
- |
Oct 06, 2025 |
1745 |
- |
| Building AI agents with the Fireworks Experimentation Platform (GA) and Build SDK (Beta) |
- |
Oct 06, 2025 |
1301 |
- |
| FLUX.1 on Fireworks: Fast, frugal, and flexible |
- |
Oct 06, 2025 |
1137 |
- |
| LLM Eval Driven Development with Claude Code |
- |
Oct 06, 2025 |
1454 |
- |
| Unlock Your Tools: Fireworks Adds OpenAI-Response API with MCP Support (Beta) |
- |
Oct 06, 2025 |
1152 |
- |
| Fireworks.ai: Fast, Affordable, Customizable Gen AI Platform |
- |
Oct 06, 2025 |
1614 |
- |
| Understanding Embeddings and Reranking at Scale |
- |
Oct 06, 2025 |
1612 |
- |
| From text to task: Constrained generation for structured extraction in R1 |
- |
Oct 06, 2025 |
5992 |
- |
| LLM Inference Performance Benchmarking (Part 1) |
- |
Oct 06, 2025 |
747 |
- |
| Using Model-as-a-Judge for Reward in Reinforcement Fine Tuning |
- |
Oct 06, 2025 |
824 |
- |
| GPUs on-demand: Not serverless, not reserved, but some third thing |
- |
Oct 06, 2025 |
1670 |
- |
| Announcing Eval Protocol |
- |
Oct 06, 2025 |
829 |
- |
| How Upwork and Fireworks deliver faster, smarter proposals for freelancers |
- |
Oct 06, 2025 |
1026 |
- |
| Fireworks f1: A breakthrough in complex reasoning with Compound AI |
- |
Oct 06, 2025 |
605 |
- |
| How Cursor built Fast Apply using the Speculative Decoding API |
- |
Oct 06, 2025 |
1052 |
- |
| Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems |
- |
Oct 06, 2025 |
1132 |
- |
| Speed, Python: Pick Two. How CUDA Graphs Enable Fast Python Code for Deep Learning |
- |
Oct 06, 2025 |
3679 |
- |
| Production-Ready AI Agents with Optimized Inference with AWS AgentCore |
- |
Oct 06, 2025 |
451 |
- |
| Doomed to Code: How we Teamed Up with Fireworks AI at MistralAI Hackathon to Conquer the Shores of Hell |
- |
Oct 06, 2025 |
3100 |
- |
| Sentient & Fireworks Powers Decentralized AI At Viral Scale |
- |
Oct 06, 2025 |
1412 |
- |
| FireAttention V2: 12x faster to make Long Contexts practical for Online Inference |
- |
Oct 06, 2025 |
891 |
- |
| Announcing Embeddings and Reranking On Fireworks AI |
- |
Oct 15, 2025 |
899 |
- |
| Optimizing Llama 4 Maverick on Fireworks AI |
- |
Oct 06, 2025 |
1205 |
- |
| DeepSeek V3.1 now on Fireworks AI! |
- |
Oct 06, 2025 |
653 |
- |
| How Enterprises are using Multimodal Models in production with Fireworks |
- |
Oct 06, 2025 |
686 |
- |
| Fireworks AI Now Supports NVIDIA NIM Deployments for Blazing AI Inference |
- |
Oct 06, 2025 |
884 |
- |
| 3D FireOptimizer: Automating the Multi-Dimensional Tradeoffs in LLM Serving |
- |
Oct 06, 2025 |
1447 |
- |
| Accelerating Code Completion with Fireworks Fast LLM Inference |
- |
Oct 06, 2025 |
639 |
- |
| Real-time, performant code assistance: How Sourcegraph scaled with Fireworks AI |
- |
Oct 06, 2025 |
1220 |
- |
| Reinforcement Fine Tuning (Beta): Train expert open models to surpass closed frontier models |
- |
Oct 06, 2025 |
958 |
- |
| FireAttention V3: Enabling AMD as a viable alternative for GPU inference |
- |
Oct 06, 2025 |
1910 |
- |
| DeepSeek R1: All you need to know 🐳 |
- |
Oct 06, 2025 |
1502 |
- |
| Getting Started with Stability’s API Powered by Fireworks |
- |
Oct 06, 2025 |
1040 |
- |
| How Notion Cuts Latency 4x and Scales Enterprise AI Workflows with Fireworks AI |
- |
Oct 06, 2025 |
584 |
- |
| LLM on the edge: Model picking with Fireworks Eval Protocol + Ollama |
- |
Oct 18, 2025 |
912 |
- |
| Fireworks and AMD partner to power the next gen of AI infrastructure on AMD Instinct™ GPUs |
- |
Oct 20, 2025 |
415 |
- |
| Deployment Shapes: One-Click Deployment Configured For You |
- |
Oct 24, 2025 |
875 |
- |
| We raised $250M To Help Enterprises Own Their AI |
- |
Oct 28, 2025 |
818 |
- |
| Accelerate your Vision Pipelines with the new NVIDIA Nemotron Nano 2 VL Model on Fireworks AI |
- |
Oct 27, 2025 |
831 |
- |
| Genspark’s Deep Research Agent Outperforms a Frontier Closed Model in Quality and Tool Calls using Fireworks RFT, Achieving a 50% Cost Reduction |
- |
Nov 01, 2025 |
1126 |
- |
| 40X Faster, and Smarter Outputs: How Vercel Turbocharged their Code Fixing Model with Open Models, Speculative Decoding and Reinforcement Fine Tuning on Fireworks? |
- |
Oct 31, 2025 |
1086 |
- |
| Fireworks RFT: Build AI agents with fine-tuned open models that outperform frontier closed models |
- |
Nov 11, 2025 |
1046 |
- |
| Modernizing Healthcare with AI: How RADPAIR and Fireworks Unlock Smarter Radiology Workflows |
- |
Nov 09, 2025 |
2408 |
- |
| 50 Trillion Tokens Per Day: The State of Agent Environments |
- |
Nov 19, 2025 |
2411 |
- |
| Fireworks Achieves Triple ISO Certification, giving Enterprises Full Control and Trust in AI at Scale |
- |
Nov 20, 2025 |
771 |
- |
| Eval Protocol: RL on your agents, in any environment |
- |
Nov 21, 2025 |
1400 |
- |