Baseten Hacker News data

HN Points	HN Title (Links to original post)	Submitted Date
113	A guide to open-source LLM inference and performance	2023-11-20
51	How we got Stable Diffusion XL inference to under 2 seconds	2023-08-31
9	Show HN: Baseten Chains – Framework and SDK for Multi-Model AI Products	2024-06-27
3	SDXL inference in under 2 seconds	2023-08-31
2	Open Source Inference Engine Baseten Raises $40M from IVP, Spark and Greylock	2024-03-14
2	Faster Mixtral inference with TensorRT-LLM and quantization	2023-12-27
2	How to double tokens per second for Llama 3 with Medusa	2024-08-20
2	Show HN: Automatically Build Nvidia TRT-LLM Engines	2024-08-01
2	FP8: Efficient model inference with 8-bit floating point numbers	2024-03-08
1	How to build function calling and JSON mode for open-source and fine-tuned LLMs	2024-09-12
1	Show HN: 60% higher tokens per second for 70B custom LLMs	2024-07-31
1	Introduction to quantizing machine learning models	2024-02-16
1	Three techniques to adapt LLMs for any use case	2023-06-15
402	Show HN: ChatLLaMA – A ChatGPT style chatbot for Facebook's LLaMA	2023-03-22
16	Show HN: Fine-tune generative models in 1 line of code	2023-03-01
1	Deploying custom ComfyUI workflows as APIs	2024-11-20
1	Continuous vs. dynamic batching for AI inference	2025-08-06
247	Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs	2025-08-07