linux-kernel - Re: [PATCH v1] mm: disable top-tier fallback to reclaim on proactive reclaim

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHS8izMPbP1XAKCMKJ6UF39uGNv6k_fkMDgS6DR+MF9OucLhEg@mail.gmail.com>
Date:   Fri, 2 Dec 2022 13:52:55 -0800
From:   Mina Almasry <almasrymina@...gle.com>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     Huang Ying <ying.huang@...el.com>,
        Yang Shi <yang.shi@...ux.alibaba.com>,
        Yosry Ahmed <yosryahmed@...gle.com>,
        Tim Chen <tim.c.chen@...ux.intel.com>, weixugc@...gle.com,
        shakeelb@...gle.com, gthelen@...gle.com, fvdl@...gle.com,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] mm: disable top-tier fallback to reclaim on proactive reclaim

On Fri, Dec 2, 2022 at 1:38 PM Andrew Morton <akpm@...ux-foundation.org> wrote:
>
> On Thu,  1 Dec 2022 15:33:17 -0800 Mina Almasry <almasrymina@...gle.com> wrote:
>
> > Reclaiming directly from top tier nodes breaks the aging pipeline of
> > memory tiers.  If we have a RAM -> CXL -> storage hierarchy, we
> > should demote from RAM to CXL and from CXL to storage. If we reclaim
> > a page from RAM, it means we 'demote' it directly from RAM to storage,
> > bypassing potentially a huge amount of pages colder than it in CXL.
> >
> > However disabling reclaim from top tier nodes entirely would cause ooms
> > in edge scenarios where lower tier memory is unreclaimable for whatever
> > reason, e.g. memory being mlocked() or too hot to reclaim.  In these
> > cases we would rather the job run with a performance regression rather
> > than it oom altogether.
> >
> > However, we can disable reclaim from top tier nodes for proactive reclaim.
> > That reclaim is not real memory pressure, and we don't have any cause to
> > be breaking the aging pipeline.
> >
>
> Is this purely from code inspection, or are there quantitative
> observations to be shared?
>

This is from code inspection, but also it is by definition. Proactive
reclaim is when the userspace does:

    echo "1m" > /path/to/cgroup/memory.reclaim

At that point the kernel tries to proactively reclaim 1 MB from that
cgroup at the userspace's behest, regardless of the actual memory
pressure in the cgroup, so proactive reclaim is not real memory
pressure as I state in the commit message.

Proactive reclaim is triggered in the code by memory_reclaim():
https://elixir.bootlin.com/linux/v6.1-rc7/source/mm/memcontrol.c#L6572

Which sets MEMCG_RECLAIM_PROACTIVE:
https://elixir.bootlin.com/linux/v6.1-rc7/source/mm/memcontrol.c#L6586

Which in turn sets sc->proactive:
https://elixir.bootlin.com/linux/v6.1-rc7/source/mm/vmscan.c#L6743

In my patch I only allow falling back to reclaim from top tier nodes
if !sc->proactive.

I was in the process of sending a v2 with the comment fix btw, but
I'll hold back on that since it seems you already merged the patch to
unstable. Thanks! If I end up sending another version of the patch it
should come with the comment fix.