linux-kernel - Re: [PATCH mm-new v2] mm/memcontrol: Flush stats when write stat file

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <blygjeudtqyxk7bhw5ycveofo4e322nycxyvupdnzq3eg7qtpo@cya4bifb2dlk>
Date: Thu, 6 Nov 2025 15:55:59 -0800
From: Shakeel Butt <shakeel.butt@...ux.dev>
To: Leon Huang Fu <leon.huangfu@...pee.com>
Cc: akpm@...ux-foundation.org, cgroups@...r.kernel.org, corbet@....net, 
	hannes@...xchg.org, inwardvessel@...il.com, jack@...e.cz, joel.granados@...nel.org, 
	kyle.meyer@....com, lance.yang@...ux.dev, laoar.shao@...il.com, 
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, linux-mm@...ck.org, 
	mclapinski@...gle.com, mhocko@...nel.org, muchun.song@...ux.dev, 
	roman.gushchin@...ux.dev, yosry.ahmed@...ux.dev
Subject: Re: [PATCH mm-new v2] mm/memcontrol: Flush stats when write stat file

On Thu, Nov 06, 2025 at 11:30:45AM +0800, Leon Huang Fu wrote:
> On Thu, Nov 6, 2025 at 9:19 AM Shakeel Butt <shakeel.butt@...ux.dev> wrote:
> >
> > +Yosry, JP
> >
> > On Wed, Nov 05, 2025 at 03:49:16PM +0800, Leon Huang Fu wrote:
> > > On high-core count systems, memory cgroup statistics can become stale
> > > due to per-CPU caching and deferred aggregation. Monitoring tools and
> > > management applications sometimes need guaranteed up-to-date statistics
> > > at specific points in time to make accurate decisions.
> >
> > Can you explain a bit more on your environment where you are seeing
> > stale stats? More specifically, how often the management applications
> > are reading the memcg stats and if these applications are reading memcg
> > stats for each nodes of the cgroup tree.
> >
> > We force flush all the memcg stats at root level every 2 seconds but it
> > seems like that is not enough for your case. I am fine with an explicit
> > way for users to flush the memcg stats. In that way only users who want
> > to has to pay for the flush cost.
> >
> 
> Thanks for the feedback. I encountered this issue while running the LTP
> memcontrol02 test case [1] on a 256-core server with the 6.6.y kernel on XFS,
> where it consistently failed.
> 
> I was aware that Yosry had improved the memory statistics refresh mechanism
> in "mm: memcg: subtree stats flushing and thresholds" [2], so I attempted to
> backport that patchset to 6.6.y [3]. However, even on the 6.15.0-061500-generic
> kernel with those improvements, the test still fails intermittently on XFS.
> 
> I've created a simplified reproducer that mirrors the LTP test behavior. The
> test allocates 50 MiB of page cache and then verifies that memory.current and
> memory.stat's "file" field are approximately equal (within 5% tolerance).
> 
> The failure pattern looks like:
> 
>   After alloc: memory.current=52690944, memory.stat.file=48496640, size=52428800
>   Checks: current>=size=OK, file>0=OK, current~=file(5%)=FAIL
> 
> Here's the reproducer code and test script (attached below for reference).
> 
> To reproduce on XFS:
>   sudo ./run.sh --xfs
>   for i in {1..100}; do sudo ./run.sh --run; echo "==="; sleep 0.1; done
>   sudo ./run.sh --cleanup
> 
> The test fails sporadically, typically a few times out of 100 runs, confirming
> that the improved flush isn't sufficient for this workload pattern.

I was hoping that you have a real world workload/scenario which is
facing this issue. For the test a simple 'sleep 2' would be enough.
Anyways that is not an argument against adding an inteface for flushing.