lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 12 Dec 2022 09:55:54 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Mina Almasry <almasrymina@...gle.com>
Cc:     Tejun Heo <tj@...nel.org>, Zefan Li <lizefan.x@...edance.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Jonathan Corbet <corbet@....net>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Shakeel Butt <shakeelb@...gle.com>,
        Muchun Song <songmuchun@...edance.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Huang Ying <ying.huang@...el.com>,
        Yang Shi <yang.shi@...ux.alibaba.com>,
        Yosry Ahmed <yosryahmed@...gle.com>, weixugc@...gle.com,
        fvdl@...gle.com, bagasdotme@...il.com, cgroups@...r.kernel.org,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: [PATCH v3] mm: Add nodes= arg to memory.reclaim

On Fri 02-12-22 14:35:31, Mina Almasry wrote:
> The nodes= arg instructs the kernel to only scan the given nodes for
> proactive reclaim. For example use cases, consider a 2 tier memory system:
> 
> nodes 0,1 -> top tier
> nodes 2,3 -> second tier
> 
> $ echo "1m nodes=0" > memory.reclaim
> 
> This instructs the kernel to attempt to reclaim 1m memory from node 0.
> Since node 0 is a top tier node, demotion will be attempted first. This
> is useful to direct proactive reclaim to specific nodes that are under
> pressure.
> 
> $ echo "1m nodes=2,3" > memory.reclaim
> 
> This instructs the kernel to attempt to reclaim 1m memory in the second tier,
> since this tier of memory has no demotion targets the memory will be
> reclaimed.
> 
> $ echo "1m nodes=0,1" > memory.reclaim
> 
> Instructs the kernel to reclaim memory from the top tier nodes, which can
> be desirable according to the userspace policy if there is pressure on
> the top tiers. Since these nodes have demotion targets, the kernel will
> attempt demotion first.
> 
> Since commit 3f1509c57b1b ("Revert "mm/vmscan: never demote for memcg
> reclaim""), the proactive reclaim interface memory.reclaim does both
> reclaim and demotion. Reclaim and demotion incur different latency costs
> to the jobs in the cgroup. Demoted memory would still be addressable
> by the userspace at a higher latency, but reclaimed memory would need to
> incur a pagefault.
> 
> The 'nodes' arg is useful to allow the userspace to control demotion
> and reclaim independently according to its policy: if the memory.reclaim
> is called on a node with demotion targets, it will attempt demotion first;
> if it is called on a node without demotion targets, it will only attempt
> reclaim.
> 
> Acked-by: Michal Hocko <mhocko@...e.com>
> Signed-off-by: Mina Almasry <almasrymina@...gle.com>

After discussion in [1] I have realized that I haven't really thought
through all the consequences of this patch and therefore I am retracting
my ack here. I am not nacking the patch at this statge but I also think
this shouldn't be merged now and we should really consider all the
consequences.

Let me summarize my main concerns here as well. The proposed
implementation doesn't apply the provided nodemask to the whole reclaim
process. This means that demotion can happen outside of the mask so the
the user request cannot really control demotion targets and that limits
the interface should there be any need for a finer grained control in
the future (see an example in [2]).
Another problem is that this can limit future reclaim extensions because
of existing assumptions of the interface [3] - specify only top-tier
node to force the aging without actually reclaiming any charges and
(ab)use the interface only for aging on multi-tier system. A change to
the reclaim to not demote in some cases could break this usecase.

My counter proposal would be to define the nodemask for memory.reclaim
as a domain to constrain the charge reclaim. That means both aging and
reclaim including demotion which is a part of aging. This will allow
to control where to demote for balancing purposes (e.g. demote to node 2
rather than 3) which is impossible with the proposed scheme.

[1] http://lkml.kernel.org/r/20221206023406.3182800-1-almasrymina@google.com
[2] http://lkml.kernel.org/r/Y5bnRtJ6sojtjgVD@dhcp22.suse.cz
[3] http://lkml.kernel.org/r/CAAPL-u8rgW-JACKUT5ChmGSJiTDABcDRjNzW_QxMjCTk9zO4sg@mail.gmail.com
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ