Announcing TensorRT integration with TensorFlow 1.7
Blog post from Google Cloud
Google and NVIDIA have announced the integration of NVIDIA's TensorRT with TensorFlow, aiming to optimize deep learning models for inference on GPUs by enhancing performance and reducing latency. TensorRT introduces FP16 and INT8 optimizations within TensorFlow, allowing for automatic selection of platform-specific kernels to enhance throughput. The integration simplifies workflows by enabling TensorFlow to utilize TensorRT for optimizing compatible sub-graphs, while TensorFlow handles the remaining execution. This approach allows models to be developed with TensorFlow's extensive feature set while benefiting from TensorRT's powerful optimizations. In tests, models like ResNet-50 showed significant performance improvements with this integration. The new TensorFlow API facilitates these optimizations by transforming frozen TensorFlow graphs into TensorRT inference graphs. TensorRT also supports INT8 quantizations, which can speed up computations and reduce memory requirements with minimal accuracy loss, using a calibration process to maintain performance. This integration also leverages NVIDIA Volta GPUs' Tensor Cores for further enhancements in throughput. The release is expected to ensure high performance while maintaining TensorFlow's flexibility and ease of use, with the integration available through the standard pip installation process once TensorFlow 1.7 is released.