lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <zutbi6jjx6rj2beytkp2ihpyxkuvg43ggsglfhimluojko4frf@gacgibzen5k4>
Date: Wed, 25 Jun 2025 16:10:16 -0700
From: Shakeel Butt <shakeel.butt@...ux.dev>
To: Davidlohr Bueso <dave@...olabs.net>
Cc: akpm@...ux-foundation.org, mhocko@...nel.org, hannes@...xchg.org, 
	roman.gushchin@...ux.dev, yosryahmed@...gle.com, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/4] mm: introduce per-node proactive reclaim interface

On Mon, Jun 23, 2025 at 11:58:51AM -0700, Davidlohr Bueso wrote:
> This adds support for allowing proactive reclaim in general on a
> NUMA system. A per-node interface extends support for beyond a
> memcg-specific interface, respecting the current semantics of
> memory.reclaim: respecting aging LRU and not supporting
> artificially triggering eviction on nodes belonging to non-bottom
> tiers.
> 
> This patch allows userspace to do:
> 
>      echo "512M swappiness=10" > /sys/devices/system/node/nodeX/reclaim
> 
> One of the premises for this is to semantically align as best as
> possible with memory.reclaim. During a brief time memcg did
> support nodemask until 55ab834a86a9 (Revert "mm: add nodes=
> arg to memory.reclaim"), for which semantics around reclaim
> (eviction) vs demotion were not clear, rendering charging
> expectations to be broken.
> 
> With this approach:
> 
> 1. Users who do not use memcg can benefit from proactive reclaim.
> The memcg interface is not NUMA aware and there are usecases that
> are focusing on NUMA balancing rather than workload memory footprint.
> 
> 2. Proactive reclaim on top tiers will trigger demotion, for which
> memory is still byte-addressable. Reclaiming on the bottom nodes
> will trigger evicting to swap (the traditional sense of reclaim).
> This follows the semantics of what is today part of the aging process
> on tiered memory, mirroring what every other form of reclaim does
> (reactive and memcg proactive reclaim). Furthermore per-node proactive
> reclaim is not as susceptible to the memcg charging problem mentioned
> above.
> 
> 3. Unlike the nodes= arg, this interface avoids confusing semantics,
> such as what exactly the user wants when mixing top-tier and low-tier
> nodes in the nodemask. Further per-node interface is less exposed to
> "free up memory in my container" usecases, where eviction is intended.
> 
> 4. Users that *really* want to free up memory can use proactive reclaim
> on nodes knowingly to be on the bottom tiers to force eviction in a
> natural way - higher access latencies are still better than swap.
> If compelled, while no guarantees and perhaps not worth the effort,
> users could also also potentially follow a ladder-like approach to
> eventually free up the memory. Alternatively, perhaps an 'evict' option
> could be added to the parameters for both memory.reclaim and per-node
> interfaces to force this action unconditionally.
> 
> Signed-off-by: Davidlohr Bueso <dave@...olabs.net>

Overall looks good but I will try to dig deeper in next couple of days
(or weeks).

One orthogonal thought: I wonder if we want a unified aging (hotness or
generation or active/inactive) view of jobs/memcgs/system. At the moment
due to the way LRUs are implemented i.e. per-memcg per-node, we can have
different view of these LRUs even for the same memcg. For example the
hottest pages in low tier node might be colder than coldest pages in the
top tier. Not sure how to implement it in a scalable way.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ