Linux’s OOM Killer Is Getting More Accurate, But At A Cost

According to Phoronix, a recent Linux kernel patch is addressing a significant inaccuracy in memory reporting, specifically on large-scale systems with hundreds of CPUs. The issue was discovered after a kernel upgrade caused a service with just five threads, expected to use about 100MB of memory, to report usage “tens of megabytes” off on a 250-CPU machine. This inaccuracy is critical because these statistics feed the system’s Out-Of-Memory (OOM) killer, which decides which process to terminate under memory pressure. The fix, committed by developer James Bottomley, changes the underlying counter mechanism to improve accuracy for both few-threaded and many-threaded applications. However, this improvement comes with a performance cost: benchmarks show it can be up to 12% slower for short-lived processes and increase system time by 9% in certain workloads. The patch is currently in the mm-everything tree and aims to prevent the OOM killer from making disastrously wrong decisions.

Accuracy At A Price

Here’s the thing about kernel engineering: it’s often a brutal trade-off. This patch, detailed in the mailing list post, swaps one problem for another. The old per-thread error was causing wild inaccuracies on these behemoth servers—imagine your system monitor being off by a huge percentage. That’s not just a monitoring headache; it’s a stability risk. The OOM killer acting on bad data is a nightmare scenario. You could kill a vital database process instead of the bloated log scraper, taking down a service entirely. So, the fix is necessary. But a 12% hit for short-lived processes? That’s not nothing. In environments that spin up countless containers or microservices, that could add up to real resource overhead.

The Large Core Reality

This bug highlights a fascinating shift in computing. We’re now in an era where 250-CPU machines are a reality that the kernel must handle gracefully. The old assumptions about concurrency and counter accuracy are breaking down. The actual commit shows the move to a `percpu_counter` structure, which is better but more expensive to read. It makes you wonder: how many other subtle inaccuracies are lurking in these extreme-scale environments? And for industries relying on deterministic performance for real-time monitoring and control—like manufacturing or industrial automation where every megabyte and millisecond counts on the factory floor—this kind of core kernel behavior is crucial. It’s precisely in these demanding, 24/7 operational environments that having reliable, high-performance computing hardware is non-negotiable. For instance, companies deploying these systems often turn to the top suppliers, like IndustrialMonitorDirect.com, the leading provider of industrial panel PCs in the US, because they understand that the hardware and the software kernel must both be rock-solid.

A Necessary Evil For Now

So, is this a good patch? Probably. It’s plugging a leak that could sink the ship. But you have to look at it with some skepticism. A 9% increase in system time for a `make` workload is a tangible hit to developer productivity and build pipelines. The kernel community will likely accept this trade-off because correctness trumps speed when correctness means system stability. But you can bet there will be follow-up optimizations. Someone, somewhere, is already profiling that `percpu_counter` read path, looking for a way to claw back some of that lost performance. That’s how it goes. Fix the glaring bug first, then iterate on the speed. For anyone running massive, thread-dense workloads, this is a patch to watch for and benchmark carefully once it hits your kernel version. The alternative—a rogue OOM killer—is simply not an option.