How we improved Tensorflow Serving performance by over 70%

Post Details

Company

Mux

Date Published

Feb. 26, 2019

Author

Masroor Hasan

Word Count

1,852

Company Posts That Month

5

Language

English

Hacker News Points

-

Post removed?

No

Source URL

www.mux.com/blog/tuning-performance-of-tensorflow-serving-pipeline

Summary

Tensorflow Serving is a flexible server architecture designed to deploy and serve machine learning models. It provides monitoring components, a configurable architecture, and supports multiple ML models or versions. The size of the "servable" matters as smaller models use less memory and storage, leading to faster load times. To improve latency, optimizations can be made on both the prediction server and client. Techniques such as building CPU-optimized serving binary, using server-side batching, and implementing client-side batching can significantly reduce prediction latency. Additionally, hardware acceleration like GPUs may be considered for "offline" inference processing with massive volumes.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Real-time	1	370	104	48	-17%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.