Home
/
Companies
/
Baseten
/
Hacker News
Baseten on HN
33 posts with 1+ points since 2022
Filters
Min points:
1
10
25
50
100
250
500
Since:
2021
2022
2023
2024
2025
2026
Posts by Month (33 total)
Hacker News Posts
Search:
Title
Points
Comments
Date
Show HN: ChatLLaMA – A ChatGPT style chatbot for Facebook's LLaMA
402
--
2023-03-22
Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs
247
--
2025-08-07
A guide to open-source LLM inference and performance
113
--
2023-11-20
Show HN: Baseten – Build ML-powered applications
112
--
2022-04-26
DALL-E Mini – Generate images from a text prompt
52
--
2022-06-10
How we got Stable Diffusion XL inference to under 2 seconds
51
--
2023-08-31
Show HN: Free Stable Diffusion 2.0 hosted interface
25
--
2022-11-24
Show HN: Fine-tune generative models in 1 line of code
16
--
2023-03-01
Show HN: Baseten Chains – Framework and SDK for Multi-Model AI Products
9
--
2024-06-27
Hosted Stable Diffusion Demo
7
--
2022-08-24
Serving four million Riffusion requests in two days
5
--
2022-12-21
Try it yourself: Speech to text with Whisper
5
--
2022-10-01
How BaseTen is using “docs as code”
5
--
2022-03-09
SDXL inference in under 2 seconds
3
--
2023-08-31
Deploying Stable Diffusion in Production Using Truss
3
--
2022-09-01
How We Built the Fastest Kimi K2.5 on Artificial Analysis
3
--
2026-02-11
Open Source Inference Engine Baseten Raises $40M from IVP, Spark and Greylock
2
--
2024-03-14
Faster Mixtral inference with TensorRT-LLM and quantization
2
--
2023-12-27
How to double tokens per second for Llama 3 with Medusa
2
--
2024-08-20
Show HN: Automatically Build Nvidia TRT-LLM Engines
2
--
2024-08-01
FP8: Efficient model inference with 8-bit floating point numbers
2
--
2024-03-08
Code generation interactive demo (Salesforce Codegen mono 2B)
2
--
2022-07-01
How to build function calling and JSON mode for open-source and fine-tuned …
1
--
2024-09-12
Show HN: 60% higher tokens per second for 70B custom LLMs
1
--
2024-07-31
Introduction to quantizing machine learning models
1
--
2024-02-16
Three techniques to adapt LLMs for any use case
1
--
2023-06-15
Accelerating model deployment: 100X faster dev loops with draft models
1
--
2022-12-09
Demo – Text generation with EleutherAI's GPT-J-6B model
1
--
2022-04-29
Deploying custom ComfyUI workflows as APIs
1
--
2024-11-20
Continuous vs. dynamic batching for AI inference
1
--
2025-08-06
Continual learning and the post monolith AI era
1
--
2026-02-06
Inferless Joins Baseten
1
--
2026-02-16
Show HN: Inference Engineering
1
--
2026-02-23