BentoML Hacker News

Filters

Since:

Posts by Month (37 total)

Hacker News Posts

Search:

Title	Points	Comments	Date
Navigating the World of Large Language Models	48	--	2024-03-22
Is LMDeploy the Ultimate Solution? Why It Outshines VLLM, TRT-LLM, TGI, and …	16	--	2024-06-20
Benchmarking LLM Inference Back Ends: VLLM, LMDeploy, MLC-LLM, TensorRT-LLM, TGI	15	--	2024-07-05
A List of Top Open-Source Embedding Models	5	--	2024-10-30
Building RAG with Open-Source and Custom AI Models	4	--	2024-05-06
Solving ML Model Reproducibility: Lessons Learned from a Covid Hackathon	4	--	2022-04-25
The Shift to Distributed LLM Inference	4	--	2025-06-11
Nvidia Data Center GPUs Explained: From A100 to B200 and Beyond	4	--	2025-08-28
From Ollama to OpenLLM: Running LLMs in the Cloud	3	--	2024-07-18
Stable Diffusion 3: Text Master, Prone Problems?	3	--	2024-06-18
A Guide to Open-Source Image Generation Models	3	--	2024-03-28
How to Beat the GPU CAP theorem in AI Inference	3	--	2025-04-30
Where to Buy or Rent GPUs for LLM Inference: The 2026 GPU …	3	--	2025-10-31
Three Levels of Running LLMs from Laptop to Cluster-Scale Distributed Inference	3	--	2025-12-02
Exploring the World of Open-Source Text-to-Speech Models	2	--	2024-09-20
Serving LlamaIndex as Rest APIs	2	--	2024-06-03
Deploying Stable Video Diffusion with BentoSVD	2	--	2023-11-28
Building a Production-Ready LangChain Application with BentoML and OpenLLM	2	--	2023-10-22
Monitoring Metrics in BentoML with Prometheus and Grafana	2	--	2023-10-20
2024 State of AI Inference Infrastructure Survey Results	2	--	2025-02-26
The Complete Guide to DeepSeek Models: From V3 to R1 and Beyond	2	--	2025-03-07
Six Infrastructure Pitfalls Slowing Down Your AI Progress	2	--	2025-03-19
Cold-Starting LLMs on Kubernetes in Under 30 Seconds	2	--	2025-04-11
What Is InferenceOps	2	--	2025-07-01
The Best Open-Source Small Language Models	2	--	2025-12-17
Modular Acquires BentoML	2	--	2026-02-11
Top Open-Source Vision Language Models	1	--	2024-10-11
Tuning TensorRT-LLM for Optimal Serving	1	--	2024-09-20
Compound AI Systems	1	--	2024-08-24
Building a RAG App with BentoCloud and Milvus Lite	1	--	2024-06-14
Scaling AI Models Like You Mean It	1	--	2024-04-26
A Guide to ComfyUI Custom Nodes	1	--	2025-01-02
Secure and Private DeepSeek Deployment	1	--	2025-02-14
Benchmarks Show Speculative Decoding Needs the Right Draft Model for 3× Gains	1	--	2025-08-08
AMD Data Center GPUs Explained: MI250X, MI300X, MI350X and Beyond	1	--	2025-09-04
LLM Benchmark and Optimization Explorer	1	--	2025-09-11
ChatGPT Usage Limits: What They Are and How to Get Rid of …	1	--	2025-10-24

Plushcap, by Matt Makai. 2021-2026.

BentoML on HN