A mysterious CPU spike in ClickHouse Cloud on GCP led to months of debugging, revealing a deeper issue within the Linux kernel's memory management. The investigation began with an occasional hiccup in ClickHouse Cloud infrastructure that engineers struggled to explain. Initially, it seemed like a random performance degradation issue, but further analysis revealed a hidden livelock caused by excessive contention on the `mmap_lock` spinlock. The lock was held for an exceptionally long time during page fault handling, scanning 1,093,267 pages in an effort to reclaim memory for the cgroup. This led to a situation where nearly all pages were activated, but only 32 pages were successfully reclaimed. A new bug was later discovered, which involved a spinlock called `lru_lock` protecting the struct `lruvec`. The fix for this issue involved enabling the Multi-Gen LRU (MGLRU) mechanism, which is designed to improve memory management and reduce contention on the spinlock. This change resolved the issue and improved overall system performance.