TensorFlow Serving 1.0
Blog post from Google Cloud
TensorFlow Serving 1.0, released in August 2017, marks a significant milestone since its initial open-source launch in February 2016 as a high-performance serving system designed for machine-learned models in production environments. Built from the TensorFlow head, future versions will align with TensorFlow releases, enhancing functionality and API stability for diverse use cases. Initially, it comprised libraries for managing model lifecycles and serving inference requests, later expanding to include a gRPC Model Server binary with a Predict API, deployable on Kubernetes. With over 800 Google projects using it in production, the system is now a stable, robust, and high-performance implementation. The release introduces a prebuilt binary available through apt-get install, simplifying the setup process without compilation, while Docker remains an option for non-Linux installations. The legacy SessionBundle model format is deprecated, with SavedModel, introduced with TensorFlow 1.0, now the supported format, encouraging users to explore the project documentation and tutorials.