lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHS8izPxMJzy_Axixkydvsw0ODHz9R7XU6WAtGJKZuMH0i=ANA@mail.gmail.com>
Date:   Fri, 9 Dec 2022 13:39:44 -0800
From:   Mina Almasry <almasrymina@...gle.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Wei Xu <weixugc@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Shakeel Butt <shakeelb@...gle.com>,
        Muchun Song <songmuchun@...edance.com>,
        Huang Ying <ying.huang@...el.com>,
        Yang Shi <yang.shi@...ux.alibaba.com>,
        Yosry Ahmed <yosryahmed@...gle.com>, fvdl@...gle.com,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] [mm-unstable] mm: Fix memcg reclaim on memory tiered systems

On Fri, Dec 9, 2022 at 1:16 PM Michal Hocko <mhocko@...e.com> wrote:
>
> On Fri 09-12-22 08:41:47, Wei Xu wrote:
> > On Fri, Dec 9, 2022 at 12:08 AM Michal Hocko <mhocko@...e.com> wrote:
> > >
> > > On Thu 08-12-22 16:59:36, Wei Xu wrote:
> > > [...]
> > > > > What I really mean is to add demotion nodes to the nodemask along with
> > > > > the set of nodes you want to reclaim from. To me that sounds like a
> > > > > more natural interface allowing for all sorts of usecases:
> > > > > - free up demotion targets (only specify demotion nodes in the mask)
> > > > > - control where to demote (e.g. select specific demotion target(s))
> > > > > - do not demote at all (skip demotion nodes from the node mask)
> > > >
> > > > For clarification, do you mean to add another argument (e.g.
> > > > demotion_nodes) in addition to the "nodes" argument?
> > >
> > > No, nodes=mask argument should control the domain where the memory
> > > reclaim should happen. That includes both aging and the reclaim. If the
> > > mask doesn't contain any lower tier node then no demotion will happen.
> > > If only a subset of lower tiers are specified then only those could be
> > > used for the demotion process. Or put it otherwise, the nodemask is not
> > > only used to filter out zonelists during reclaim it also restricts
> > > migration targets.
> > >
> > > Is this more clear now?
> >

I think putting the demotion sources and demotion targets in the same
nodemask is a bit confusing, and prone to error. IIUC the user puts
both the demotion source and the demotion target in the nodemaks, and
the kernel infers which is which depending on whether the node is a
top-tier node, or a bottom tier node. I think in the future this will
become ambiguous. What happens in the future when the user when the
machine has N memory tiers and the user specifies a node in a middle
tier in the nodemask? Does that mean the user wants demotion from or
to this node? Middle memory tiers can act as both...

I think if your goal is to constrain demotion targets then a much more
clear and future proof way is to simply add a second arg to
memory.reclaim "allowed_demotion_targets=".\

> > In that case, how can we request demotion only from toptier nodes
> > (without counting any reclaimed bytes from other nodes),  which is our
> > memory tiering use case?
>
> I am not sure I follow. Could you be more specific please?
>
> > Besides, when both toptier and demotion nodes are specified, the
> > demoted pages should only be counted as aging and not be counted
> > towards the requested bytes of try_to_free_mem_cgroup_pages(), which
> > is what this patch tries to address.
>
> This should be addressed by
> http://lkml.kernel.org/r/Y5B1K5zAE0PkjFZx@dhcp22.suse.cz, no?

I think I provided a test case in [1] showing very clearly that this
breaks one of our use cases, i.e. the use case where the user is
asking to demote X bytes from the top tier nodes to the lower tier
nodes. I would not like to proceed with a fix that breaks one of our
use cases. I believe I provided in this patch a fix that caters to all
existing users, and we should take the fix in this patch over a fix
that breaks use cases.

[1] https://lore.kernel.org/all/CAHS8izMKK107wVFSJvg36nQ=WzXd8_cjYBtR0p47L+XLYUSsqA@mail.gmail.com/

> --
> Michal Hocko
> SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ