linux-kernel - Re: [PATCH] mm: skip lru_note_cost() when scanning only file or anon

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250711172028.GA991@cmpxchg.org>
Date: Fri, 11 Jul 2025 13:20:28 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: Roman Gushchin <roman.gushchin@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Shakeel Butt <shakeel.butt@...ux.dev>,
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	Michal Hocko <mhocko@...nel.org>,
	David Hildenbrand <david@...hat.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: skip lru_note_cost() when scanning only file or anon

On Fri, Jul 11, 2025 at 08:50:44AM -0700, Roman Gushchin wrote:
> lru_note_cost() records relative cost of incurring io and cpu spent
> on lru rotations, which is used to balance the pressure on file and
> anon memory. The applied pressure is inversely proportional to the
> recorded cost of reclaiming, but only within 2/3 of the range
> (swappiness aside).
> 
> This is useful when both anon and file memory is reclaimable, however
> in many cases it's not the case: e.g. there might be no swap,
> proactive reclaim can target anon memory specifically,
> the memory pressure can come from cgroup v1's memsw limit, etc.
> In all these cases recording the cost will only bias all following
> reclaim, also potentially outside of the scope of the original memcg.
> 
> So it's better to not record the cost if it comes from the initially
> biased reclaim.
> 
> lru_note_cost() is a relatively expensive function, which traverses
> the memcg tree up to the root and takes the lruvec lock on each level.
> Overall it's responsible for about 50% of cycles spent on lruvec lock,
> which might be a non-trivial number overall under heavy memory
> pressure. So optimizing out a large number of lru_note_cost() calls
> is also beneficial from the performance perspective.

Does it actually help? It's under elevated pressure, when lru locks
are the most contended, that we also usually scan both lists.

The caveat with this patch is that, aside from the static noswap
scenario, modes can switch back and forth abruptly or even overlap.

So if you leave a pressure scenario and go back to cache trimming, you
will no longer age the cost information anymore. The next spike could
be starting out with potentially quite stale information.

Or say proactive reclaim recently already targeted anon, and there
were rotations and pageouts; that would be useful data for a reactive
reclaimer doing work at around the same time, or shortly thereafter.

So for everything but the static noswap case, the patch makes me
nervous. And I'm not sure it actually helps in the cases where it
would matter the most.

It might make more sense to look into the cost (ha) of the cost
recording itself. Can we turn it into a vmstat item? That would make
it lockless, would get rstat batching up the cgroup tree etc. This
doesn't need to be 100% precise and race free after all.