lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y5g41HF2TcLzro4o@dhcp22.suse.cz>
Date:   Tue, 13 Dec 2022 09:33:24 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Mina Almasry <almasrymina@...gle.com>
Cc:     Tejun Heo <tj@...nel.org>, Zefan Li <lizefan.x@...edance.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Jonathan Corbet <corbet@....net>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Shakeel Butt <shakeelb@...gle.com>,
        Muchun Song <songmuchun@...edance.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Huang Ying <ying.huang@...el.com>,
        Yang Shi <yang.shi@...ux.alibaba.com>,
        Yosry Ahmed <yosryahmed@...gle.com>, weixugc@...gle.com,
        fvdl@...gle.com, bagasdotme@...il.com, cgroups@...r.kernel.org,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: [PATCH v3] mm: Add nodes= arg to memory.reclaim

On Mon 12-12-22 16:54:27, Mina Almasry wrote:
> On Mon, Dec 12, 2022 at 12:55 AM Michal Hocko <mhocko@...e.com> wrote:
[...]
> > Let me summarize my main concerns here as well. The proposed
> > implementation doesn't apply the provided nodemask to the whole reclaim
> > process. This means that demotion can happen outside of the mask so the
> > the user request cannot really control demotion targets and that limits
> > the interface should there be any need for a finer grained control in
> > the future (see an example in [2]).
> > Another problem is that this can limit future reclaim extensions because
> > of existing assumptions of the interface [3] - specify only top-tier
> > node to force the aging without actually reclaiming any charges and
> > (ab)use the interface only for aging on multi-tier system. A change to
> > the reclaim to not demote in some cases could break this usecase.
> >
> 
> I think this is correct. My use case is to request from the kernel to
> do demotion without reclaim in the cgroup, and the reason for that is
> stated in the commit message:
> 
> "Reclaim and demotion incur different latency costs to the jobs in the
> cgroup. Demoted memory would still be addressable by the userspace at
> a higher latency, but reclaimed memory would need to incur a
> pagefault."
> 
> For jobs of some latency tiers, we would like to trigger proactive
> demotion (which incurs relatively low latency on the job), but not
> trigger proactive reclaim (which incurs a pagefault). I initially had
> proposed a separate interface for this, but Johannes directed me to
> this interface instead in [1]. In the same email Johannes also tells
> me that meta's reclaim stack relies on memory.reclaim triggering
> demotion, so it seems that I'm not the first to take a dependency on
> this. Additionally in [2] Johannes also says it would be great if in
> the long term reclaim policy and demotion policy do not diverge.

I do recognize your need to control the demotion but I argue that it is
a bad idea to rely on an implicit behavior of the memory reclaim and an
interface which is _documented_ to primarily _reclaim_ memory.

Really, consider that the current demotion implementation will change
in the future and based on a newly added heuristic memory reclaim or
compression would be preferred over migration to a different tier.  This
might completely break your current assumptions and break your usecase
which relies on an implicit demotion behavior.  Do you see that as a
potential problem at all? What shall we do in that case? Special case
memory.reclaim behavior?

Now to your specific usecase. If there is a need to do a memory
distribution balancing then fine but this should be a well defined
interface. E.g. is there a need to not only control demotion but
promotions as well? I haven't heard anybody requesting that so far
but I can easily imagine that like outsourcing the memory reclaim to
the userspace someone might want to do the same thing with the numa
balancing because $REASONS. Should that ever happen, I am pretty sure
hooking into memory.reclaim is not really a great idea.

See where I am coming from?

> [1] https://lore.kernel.org/linux-mm/Y35fw2JSAeAddONg@cmpxchg.org/
> [2] https://lore.kernel.org/linux-mm/Y36fIGFCFKiocAd6@cmpxchg.org/
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ