The Overflowing Timeout Error - A Debugging Journey in Memgraph!
Blog post from Memgraph
In a detailed exploration of debugging within the Memgraph system, the author recounts a journey of tackling overflow issues related to query timeouts and stack overflows. Initially, the team encountered integer overflow problems with their timeout mechanism, which was linked to the TSC-based timer that proved unreliable on certain CPUs, specifically AMD Ryzen 7. This led them to adopt POSIX timers, which provided a more stable solution without performance penalties. Concurrently, the integration of the jemalloc library presented new challenges, as its inclusion led to segmentation faults due to stack overflows, partly because of the minimal stack size used by helper threads created with the SIGEV_THREAD flag in POSIX timers. The solution involved switching to SIGEV_THREAD_ID, which allows directing signals to specific threads, thereby avoiding stack overflow issues and maintaining compatibility with jemalloc. This debugging process, while complex, is highlighted as an opportunity for deeper understanding and learning within programming and systems engineering.