FP8: Efficient model inference with 8-bit floating point numbers

Post Details

Company

Baseten

Date Published

March 7, 2024

Author

Pankaj Gupta, Philip Kiely

Word Count

1,021

Language

English

Hacker News Points

Source URL

www.baseten.co/blog/fp8-efficient-model-inference-with-8-bit-floating-point-numbers

Summary

FP8 is an 8-bit floating point data format that enables more efficient model inference with larger dynamic range compared to INT8, making it suitable for quantizing LLMs' activations and offering better performance improvements without significant degradation of output quality.

Plushcap, by Matt Makai. 2021-2026.