linux-kernel - Re: [RFC PATCH V1] mm: Disable demotion from proactive reclaim

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <874juonbmv.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date:   Thu, 24 Nov 2022 13:51:20 +0800
From:   "Huang, Ying" <ying.huang@...el.com>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Mina Almasry <almasrymina@...gle.com>,
        Yang Shi <yang.shi@...ux.alibaba.com>,
        Yosry Ahmed <yosryahmed@...gle.com>,
        Tim Chen <tim.c.chen@...ux.intel.com>, weixugc@...gle.com,
        shakeelb@...gle.com, gthelen@...gle.com, fvdl@...gle.com,
        Michal Hocko <mhocko@...nel.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Muchun Song <songmuchun@...edance.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: [RFC PATCH V1] mm: Disable demotion from proactive reclaim

Hi, Johannes,

Johannes Weiner <hannes@...xchg.org> writes:
[...]
>
> The fallback to reclaim actually strikes me as wrong.
>
> Think of reclaim as 'demoting' the pages to the storage tier. If we
> have a RAM -> CXL -> storage hierarchy, we should demote from RAM to
> CXL and from CXL to storage. If we reclaim a page from RAM, it means
> we 'demote' it directly from RAM to storage, bypassing potentially a
> huge amount of pages colder than it in CXL. That doesn't seem right.
>
> If demotion fails, IMO it shouldn't satisfy the reclaim request by
> breaking the layering. Rather it should deflect that pressure to the
> lower layers to make room. This makes sure we maintain an aging
> pipeline that honors the memory tier hierarchy.

Yes.  I think that we should avoid to fall back to reclaim as much as
possible too.  Now, when we allocate memory for demotion
(alloc_demote_page()), __GFP_KSWAPD_RECLAIM is used.  So, we will trigger
kswapd reclaim on lower tier node to free some memory to avoid fall back
to reclaim on current (higher tier) node.  This may be not good enough,
for example, the following patch from Hasan may help via waking up
kswapd earlier.

https://lore.kernel.org/linux-mm/b45b9bf7cd3e21bca61d82dcd1eb692cd32c122c.1637778851.git.hasanalmaruf@fb.com/

Do you know what is the next step plan for this patch?

Should we do even more?

>From another point of view, I still think that we can use falling back
to reclaim as the last resort to avoid OOM in some special situations,
for example, most pages in the lowest tier node are mlock() or too hot
to be reclaimed.

> So I'm hesitant to design cgroup controls around the current behavior.
>

Best Regards,
Huang, Ying