Home / Companies / Baseten / Hacker News

Baseten on HN

33 posts with 1+ points since 2022

Filters
Since:
Posts by Month (33 total)
Hacker News Posts
Title Points Comments Date
Show HN: ChatLLaMA – A ChatGPT style chatbot for Facebook's LLaMA 402 -- 2023-03-22
Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs 247 -- 2025-08-07
A guide to open-source LLM inference and performance 113 -- 2023-11-20
Show HN: Baseten – Build ML-powered applications 112 -- 2022-04-26
DALL-E Mini – Generate images from a text prompt 52 -- 2022-06-10
How we got Stable Diffusion XL inference to under 2 seconds 51 -- 2023-08-31
Show HN: Free Stable Diffusion 2.0 hosted interface 25 -- 2022-11-24
Show HN: Fine-tune generative models in 1 line of code 16 -- 2023-03-01
Show HN: Baseten Chains – Framework and SDK for Multi-Model AI Products 9 -- 2024-06-27
Hosted Stable Diffusion Demo 7 -- 2022-08-24
Serving four million Riffusion requests in two days 5 -- 2022-12-21
Try it yourself: Speech to text with Whisper 5 -- 2022-10-01
How BaseTen is using “docs as code” 5 -- 2022-03-09
SDXL inference in under 2 seconds 3 -- 2023-08-31
Deploying Stable Diffusion in Production Using Truss 3 -- 2022-09-01
How We Built the Fastest Kimi K2.5 on Artificial Analysis 3 -- 2026-02-11
Open Source Inference Engine Baseten Raises $40M from IVP, Spark and Greylock 2 -- 2024-03-14
Faster Mixtral inference with TensorRT-LLM and quantization 2 -- 2023-12-27
How to double tokens per second for Llama 3 with Medusa 2 -- 2024-08-20
Show HN: Automatically Build Nvidia TRT-LLM Engines 2 -- 2024-08-01
FP8: Efficient model inference with 8-bit floating point numbers 2 -- 2024-03-08
Code generation interactive demo (Salesforce Codegen mono 2B) 2 -- 2022-07-01
How to build function calling and JSON mode for open-source and fine-tuned … 1 -- 2024-09-12
Show HN: 60% higher tokens per second for 70B custom LLMs 1 -- 2024-07-31
Introduction to quantizing machine learning models 1 -- 2024-02-16
Three techniques to adapt LLMs for any use case 1 -- 2023-06-15
Accelerating model deployment: 100X faster dev loops with draft models 1 -- 2022-12-09
Demo – Text generation with EleutherAI's GPT-J-6B model 1 -- 2022-04-29
Deploying custom ComfyUI workflows as APIs 1 -- 2024-11-20
Continuous vs. dynamic batching for AI inference 1 -- 2025-08-06
Continual learning and the post monolith AI era 1 -- 2026-02-06
Inferless Joins Baseten 1 -- 2026-02-16
Show HN: Inference Engineering 1 -- 2026-02-23