XLA - TensorFlow, compiled
Blog post from Google Cloud
XLA (Accelerated Linear Algebra) is a compiler developed by Google to enhance the performance of TensorFlow by optimizing data flow graphs through JIT compilation techniques. It analyzes TensorFlow graphs at runtime, specializes them for actual dimensions and types, and fuses multiple operations to produce efficient native machine code for devices like CPUs, GPUs, and custom accelerators such as Google's TPU. XLA retains TensorFlow's flexibility while addressing performance concerns by minimizing kernel launches and memory allocations through efficient operation fusion. It also offers significant executable size reductions for models in restricted-memory environments, such as mobile devices, by leveraging ahead-of-time compilation (AOT) with tools like tfcompile. Although still in its early stages, XLA supports the addition of new device backends with less implementation effort compared to re-implementing all TensorFlow operations, thus paving the way for TensorFlow to run efficiently on a broader range of hardware.