Deepinfra Blog - Plushcap

Blog URL

deepinfra.com/blog

Posts YTD

75 ↑ vs 6 last year

Avg Posts/Month

0.0 since 2022

Monthly Post Volume

Start year: 2023 2024 2025 2026

Post Details

Search:

Title	Author	Published	Words	HN Pts
Juggernaut FLUX is live on DeepInfra!	Oguz Vuruskaner	2025-03-25	349	--
Enhancing Open-Source LLMs with Function Calling Feature	Pernekhan Utemuratov	2024-01-26	1,025	--
Guaranteed JSON output on Open-Source LLMs.	Patrick Reiter Horn	2024-03-08	624	--
How to use CivitAI LoRAs: 5-Minute AI Guide to Stunning Double Exposure …	Oguz Vuruskaner	2025-01-23	391	--
Introducing Tool Calling with LangChain, Search the Web with Tavily and Tool …	Oguz Vuruskaner	2024-07-05	583	--
FLUX.1-dev Guide: Mastering Text-to-Image AI Prompts for Stunning and Consistent Visuals	Oguz Vuruskaner	2024-09-04	1,276	--
How to deploy Databricks Dolly v2 12b, instruction tuned casual language model.	Yessen Kanapin	2023-04-12	349	--
A Milestone on Our Journey Building Deep Infra and Scaling Open Source …	Yessen Kanapin	2025-04-22	589	--
Model Distillation Making AI Models Efficient	Deep	2025-04-10	1,426	--
Fork of Text Generation Inference.	Nikola Borisov	2023-08-09	417	--
Getting Started	Nikola Borisov	2023-03-02	278	--
Long Context models incoming	Iskren Chernev	2023-11-21	628	--
The easiest way to build AI applications with Llama 2 LLMs.	Nikola Borisov	2023-08-02	603	--
A short intro on running Stable Diffusion on DeepInfra	Iskren	2023-03-08	218	--
Use OpenAI API clients with LLaMas	Iskren Chernev	2023-08-28	343	--
Inference LoRA adapter model	Askar Aitzhan	2024-12-06	459	--
Unleashing the Potential of AI for Exceptional Gaming Experiences	Tsveta Gavanozova	2023-11-10	500	--
Chat with books using DeepInfra and LlamaIndex	Oguz Vuruskaner	2024-06-07	565	--
Seed Anchoring and Parameter Tweaking with SDXL Turbo: Create Stunning Cubist Art	Oguz Vuruskaner	2024-09-12	1,233	--
Deploy Custom LLMs on DeepInfra	Iskren Chernev	2024-03-01	276	--
Introducing GPU Instances: On-Demand GPU Compute for AI Workloads	Deep	2025-06-09	792	--
How to OpenAI Whisper with per-sentence and per-word timestamp segmentation using DeepInfra	Yessen Kanapin	2023-04-05	323	--
Building a Voice Assistant with Whisper, LLM, and TTS	Askar Aitzhan	2024-09-20	748	--
Search That Actually Works: A Guide to LLM Rerankers	Deep	2025-09-10	2,122	--
Lzlv model for roleplaying and creative work	Nikola Borisov	2023-11-02	532	--
Compare Llama2 vs OpenAI models for FREE.	Nikola Borisov	2023-09-28	406	--
Langchain improvements: async and streaming	Iskren Chernev	2023-10-25	292	--
How to deploy google/flan-ul2 - simple. (open source ChatGPT alternative)	Nikola Borisov	2023-03-17	495	--
Art That Talks Back: A Hands-On Tutorial on Talking Images	Oguz Vuruskaner	2025-03-07	591	--
Deep Infra Launches Access to NVIDIA Nemotron Models for Vision, Retrieval, and …	Yessen Kanapin	2025-10-28	814	--
How to deploy Databricks Dolly v2 12b, instruction tuned casual language model.	Yessen Kanapin	2023-04-12	541	--
Power the Next Era of Image Generation with FLUX.2 Visual Intelligence on …	Deep	2025-11-25	749	--
Kimi K2 0905 API from Deepinfra: Practical Speed, Predictable Costs, Built for …	Deep	2025-12-01	1,837	--
GLM-4.6 API: Get fast first tokens at the best $/M from Deepinfra's …	Deep	2025-12-01	2,022	--
Llama 3.1 70B Instruct API from DeepInfra: Snappy Starts, Fair Pricing, Production …	Deep	2025-12-01	2,197	--
Accelerating Reasoning Workflows with Nemotron 3 Nano on DeepInfra	Yessen Kanapin	2025-12-15	909	--
Pricing 101: Token Math & Cost-Per-Completion Explained	Deep	2026-01-13	6,002	--
From Precision to Quantization: A Practical Guide to Faster, Cheaper LLMs	Deep	2026-01-13	2,911	--
How the Models Perform on DeepInfra: Long-Context Performance, Throughput, and Cost	Deep	2026-01-13	1,730	--
Nemotron 3 Nano vs GPT-OSS-20B: Performance, Benchmarks & DeepInfra Results	Deep	2026-01-13	1,673	--
Build an OCR-Powered PDF Reader & Summarizer with DeepInfra (Kimi K2)	Deep	2026-01-13	3,944	--
LLM API Provider Performance KPIs 101: TTFT, Throughput & End-to-End Goals	Deep	2026-01-13	2,103	--
Nemotron 3 Nano Explained: NVIDIA’s Efficient Small LLM and Why It Matters	Deep	2026-01-13	2,280	--
Reliable JSON-Only Responses with DeepInfra LLMs	Deep	2026-02-02	1,713	--
Function Calling for AI APIs in DeepInfra — How to Extend Your …	Deep	2026-02-02	1,496	--
NVIDIA Nemotron API Pricing Guide 2026	Deep	2026-02-02	1,280	--
Best API for Kimi K2.5: Why DeepInfra Leads in Speed, TTFT, and …	Deep	2026-02-02	1,716	--
Build a Streaming Chat Backend in 10 Minutes	Deep	2026-02-02	2,435	--
Qwen API Pricing Guide 2026: Max Performance on a Budget	Deep	2026-02-02	1,412	--
Building Efficient AI Inference on NVIDIA Blackwell Platform	Deep	2026-02-12	1,084	--
Introducing NVIDIA Nemotron 3 Super on DeepInfra	Aray Sultanbekova	2026-03-11	938	--
Qwen3.5 27B API Benchmarks: Latency, Throughput & Cost	Deep	2026-04-03	1,375	--
Qwen3.5 9B API Benchmarks: Latency, Throughput & Cost	Deep	2026-04-03	1,298	--
Qwen3.5 4B via DeepInfra: Latency, Throughput & Cost	Deep	2026-04-03	1,099	--
GLM-5 API Benchmarks: Latency, Throughput & Cost	Deep	2026-04-03	1,543	--
Kimi K2 0905 API Benchmarks: Latency, Throughput & Cost	han	2026-04-03	1,465	--
NVIDIA Nemotron 3 Super 120B API Benchmarks: Latency & Cost	Deep	2026-04-03	1,697	--
Qwen3 Coder 480B A35B API Benchmarks: Latency & Cost	Deep	2026-04-03	1,498	--
MiniMax-M2.5 API Benchmarks: Latency, Throughput & Cost	Deep	2026-04-03	1,853	--
DeepSeek V3.2 API Benchmarks: Latency, Throughput & Cost	Deep	2026-04-03	2,011	--
Kimi K2.5 API Benchmarks: Latency, Throughput & Cost	Deep	2026-04-03	1,701	--
Qwen3.5 122B A10B API Benchmarks: Latency, Throughput & Cost	Deep	2026-04-03	1,361	--
Step 3.5 Flash API Benchmarks: Latency, Throughput & Cost	Deep	2026-04-03	1,632	--
Qwen3.5 0.8B API Benchmarks: Latency, Throughput & Cost	han	2026-04-03	1,312	--
Qwen3.5 397B A17B API Benchmarks: Latency, Throughput & Cost	Deep	2026-04-03	2,094	--
Qwen3.5 2B via DeepInfra: Latency, Throughput & Cost	Deep	2026-04-03	1,087	--
NVIDIA Nemotron 3 Nano 30B API Benchmarks: Latency & Cost	Deep	2026-04-03	1,256	--
GLM-4.7-Flash API Benchmarks: Latency, Throughput & Cost	Deep	2026-04-03	1,455	--
Qwen3.5 35B A3B API Benchmarks: Latency, Throughput & Cost	Deep	2026-04-03	1,201	--
Best Models for OpenClaw: Top Picks for Agentic Workloads	Deep	2026-04-28	2,642	--
Introducing NVIDIA Nemotron 3 Nano Omni on DeepInfra	Aray Sultanbekova	2026-04-28	1,109	--
What Is Google TurboQuant and What Does It Mean for Open Source …	Deep	2026-04-28	1,988	--
Inference Economics: True AI Costs at Scale	Deep	2026-04-28	1,796	--
Best OpenClaw Alternatives: Hermes Agent, ZeroClaw & NemoClaw	Deep	2026-04-28	2,193	--
How to Use OpenClaw with DeepInfra: Setup & Workflow Guide	Deep	2026-04-28	2,392	--
DeepInfra is now a supported Hugging Face Inference Provider	Aray Sultanbekova	2026-04-29	903	--
DeepSeek V4 Pro: Model Overview, Features & Performance Guide	Deep	2026-04-30	1,108	--
Kimi K2.6 is Now Available on DeepInfra	Deep	2026-04-30	1,477	--
DeepSeek V4 Pro (Max) API Benchmarks: Latency, Throughput & Cost Analysis	Deep	2026-04-30	2,101	--
Kimi K2.6 Model Overview: Architecture, Features & Capabilities	Deep	2026-04-30	1,323	--
Open vs Closed Source AI Models: Intelligence, Price & Speed Compared	Deep	2026-04-30	2,233	--
Kimi K2.6 API Benchmarks: Latency, TPS & Cost Analysis (2026)	Deep	2026-04-30	2,191	--
DeepSeek V4 Pro Is Now Available on DeepInfra	Deep	2026-04-30	1,530	--
Kimi K2.6 Pricing Guide 2026: Compare Costs & Deployment Strategies	Deep	2026-04-30	3,462	--
DeepSeek V4 Pro Pricing Guide 2026: Pricing, Providers & Cost Comparison	Deep	2026-04-30	3,759	--
We've Raised $107M to Build the Inference Cloud the AI Era Actually …	Yessen Kanapin	2026-05-04	952	--
Best API Providers for GLM-5.1 in 2026	Deep	2026-05-25	1,509	--
GLM-5.1 Model Overview: Features, Capabilities & Use Cases	Deep	2026-05-25	1,148	--
Best Kimi K2.6 API Providers for Developers (2026)	Deep	2026-05-25	1,165	--
GLM-5.1 on DeepInfra: Z.AI’s Agentic Engineering Model	Deep	2026-05-25	1,258	--
Gemma 4 on DeepInfra: Fast & Scalable Open AI Models	Deep	2026-05-25	1,488	--
GLM-5.1 API Benchmarks: Latency, Throughput & Cost	Deep	2026-05-25	2,142	--
NVIDIA Nemotron 3 Super on DeepInfra: 120B MoE Model	Deep	2026-05-25	1,486	--
Gemma 4 Model Overview: Features, Architecture & Use Cases	Deep	2026-05-25	1,258	--
Gemma 4 26B A4B API Benchmarks: Latency, Throughput & Cost	Deep	2026-05-25	1,660	--
Gemma 4 Pricing, Benchmarks & Real-World Cost Analysis	Deep	2026-05-25	2,955	--
Best SaaS Platforms for Deploying Gemma 4 in 2026	Deep	2026-05-25	1,467	--
Best API Providers for DeepSeek V4 in 2026	Deep	2026-05-25	1,179	--
Nemotron 3 Super Provider Pricing Comparison (2026)	Deep	2026-05-25	2,359	--
Best API Providers for NVIDIA Nemotron 3 Super 120B	Deep	2026-05-25	1,303	--
NVIDIA Nemotron 3 Super: Model Overview & Integration Guide	Deep	2026-05-25	1,160	--
GLM-5.1 Pricing Guide: API Cost Comparison & Analysis	Deep	2026-05-25	2,337	--
NVIDIA Nemotron 3 Super 120B API Benchmarks	Deep	2026-05-25	1,867	--
Open-Source vs Closed-Source AI Models: Is the Gap Worth It?	Deep	2026-05-26	3,331	--
OpenClaw Security: Prevent Prompt Injection & Supply Chain Attacks	Deep	2026-05-26	2,422	--
How Mixture of Experts Models Changed LLM Economics	Deep	2026-05-26	2,595	--
OpenClaw Use Cases That Deliver Real ROI	Deep	2026-05-26	2,380	--
OpenClaw Cost Optimization: Cut AI API Costs by 90%	Deep	2026-05-26	2,394	--
DeepInfra Launches Access to NVIDIA Cosmos 3 World Foundation Models for Physical …	Yessen Kanapin	2026-06-04	769	--
Nemotron 3 Ultra, 3.5 Content Safety and ASR models are now live …	Yessen Kanapin	2026-06-04	827	--
Step 3.7 Flash is Live on DeepInfra: An Agentic, Multimodal Model Built …	Deep	2026-06-12	910	--

Plushcap, by Matt Makai. 2021-2026.