Home / Companies / Baseten / Blog / Post Details
Content Deep Dive

FP8: Efficient model inference with 8-bit floating point numbers

Blog post from Baseten

Post Details
Company
Date Published
Author
Pankaj Gupta, Philip Kiely
Word Count
1,021
Language
English
Hacker News Points
2
Summary

FP8 is an 8-bit floating point data format that enables more efficient model inference with larger dynamic range compared to INT8, making it suitable for quantizing LLMs' activations and offering better performance improvements without significant degradation of output quality.