Baseten Hacker News

Filters

Since:

Posts by Month (34 total)

Hacker News Posts

Search:

Title	Points	Comments	Date
Show HN: ChatLLaMA – A ChatGPT style chatbot for Facebook's LLaMA	402	--	2023-03-22
Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs	247	--	2025-08-07
A guide to open-source LLM inference and performance	113	--	2023-11-20
Show HN: Baseten – Build ML-powered applications	112	--	2022-04-26
DALL-E Mini – Generate images from a text prompt	52	--	2022-06-10
How we got Stable Diffusion XL inference to under 2 seconds	51	--	2023-08-31
Show HN: Free Stable Diffusion 2.0 hosted interface	25	--	2022-11-24
Show HN: Fine-tune generative models in 1 line of code	16	--	2023-03-01
Show HN: Baseten Chains – Framework and SDK for Multi-Model AI Products	9	--	2024-06-27
Hosted Stable Diffusion Demo	7	--	2022-08-24
The Math Behind TurboQuant	7	--	2026-03-27
Serving four million Riffusion requests in two days	5	--	2022-12-21
Try it yourself: Speech to text with Whisper	5	--	2022-10-01
How BaseTen is using “docs as code”	5	--	2022-03-09
SDXL inference in under 2 seconds	3	--	2023-08-31
Deploying Stable Diffusion in Production Using Truss	3	--	2022-09-01
How We Built the Fastest Kimi K2.5 on Artificial Analysis	3	--	2026-02-11
Open Source Inference Engine Baseten Raises $40M from IVP, Spark and Greylock	2	--	2024-03-14
Faster Mixtral inference with TensorRT-LLM and quantization	2	--	2023-12-27
How to double tokens per second for Llama 3 with Medusa	2	--	2024-08-20
Show HN: Automatically Build Nvidia TRT-LLM Engines	2	--	2024-08-01
FP8: Efficient model inference with 8-bit floating point numbers	2	--	2024-03-08
Code generation interactive demo (Salesforce Codegen mono 2B)	2	--	2022-07-01
How to build function calling and JSON mode for open-source and fine-tuned …	1	--	2024-09-12
Show HN: 60% higher tokens per second for 70B custom LLMs	1	--	2024-07-31
Introduction to quantizing machine learning models	1	--	2024-02-16
Three techniques to adapt LLMs for any use case	1	--	2023-06-15
Accelerating model deployment: 100X faster dev loops with draft models	1	--	2022-12-09
Demo – Text generation with EleutherAI's GPT-J-6B model	1	--	2022-04-29
Deploying custom ComfyUI workflows as APIs	1	--	2024-11-20
Continuous vs. dynamic batching for AI inference	1	--	2025-08-06
Continual learning and the post monolith AI era	1	--	2026-02-06
Inferless Joins Baseten	1	--	2026-02-16
Show HN: Inference Engineering	1	--	2026-02-23

Baseten on HN