High-Scale Feature Serving at Low Cost With Caching

Company

Tecton

Date Published

Dec. 5, 2023

Author

Mihir Mathur

Word count

1213

Language

English

Hacker News points

URL

www.tecton.ai/blog/high-scale-feature-serving-at-low-cost-with-caching-machine-learning

Summary

The Tecton Serving Cache is a server-side cache designed to reduce infrastructure costs of feature serving for machine learning models at high scale. It simplifies feature caching, boosting performance and cost efficiency as systems scale. The cache can be used in various AI applications such as recommendation systems, personalized search, customer targeting, and forecasting, where stale feature values are acceptable for major reductions in latency and cost. To use the Tecton Serving Cache, modelers add two pieces of configuration: a cached Feature View and a Feature Service with caching enabled. The cache can be configured to retrieve pre-computed feature values from memory, with parameters such as `max_age_seconds` determining the maximum number of seconds a feature is cached before expiration. Benchmarks show up to 80% latency reduction and up to 95% cost reduction compared to baseline features retrieval patterns. The Tecton Serving Cache employs Redis as a backend, using entity-level caching to strike a balance between Feature View-level and Feature Service-level caching. Future improvements will include more flexibility, request-level cache directives, and better performance.