Killing the ProcessPoolExecutor
Blog post from Tinybird
A seemingly minor inefficiency in an integration test for Tinybird, a real-time data platform, led to significant system optimizations. Initially, a single test took a second instead of a millisecond, prompting the team to investigate and discover inefficiencies in Python's Global Interpreter Lock (GIL) and the use of ProcessPoolExecutor for parallel processing. By shifting CPU-intensive tasks to C++ extensions and using the ThreadPoolExecutor with refined control over the GIL, they achieved a 50% reduction in global memory usage, decreased CPU usage by 10-20%, reduced the number of threads and processes by 60-70%, and virtually eliminated I/O traffic. This optimization not only improved performance but also simplified system management, demonstrating the value of questioning existing assumptions and exploring alternative solutions to enhance application efficiency.