linux-kernel - Re: [PATCH v10 10/14] mm: multi-gen LRU: kill switch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20220426152237.21d3f173eded69c0f63911f0@linux-foundation.org>
Date:   Tue, 26 Apr 2022 15:22:37 -0700
From:   Andrew Morton <akpm@...ux-foundation.org>
To:     Yu Zhao <yuzhao@...gle.com>
Cc:     Tejun Heo <tj@...nel.org>, Stephen Rothwell <sfr@...hwell.id.au>,
        Linux-MM <linux-mm@...ck.org>, Andi Kleen <ak@...ux.intel.com>,
        Aneesh Kumar <aneesh.kumar@...ux.ibm.com>,
        Barry Song <21cnbao@...il.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Hillf Danton <hdanton@...a.com>, Jens Axboe <axboe@...nel.dk>,
        Jesse Barnes <jsbarnes@...gle.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Jonathan Corbet <corbet@....net>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        Mel Gorman <mgorman@...e.de>,
        Michael Larabel <Michael@...haellarabel.com>,
        Michal Hocko <mhocko@...nel.org>,
        Mike Rapoport <rppt@...nel.org>,
        Rik van Riel <riel@...riel.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Will Deacon <will@...nel.org>,
        Ying Huang <ying.huang@...el.com>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Kernel Page Reclaim v2 <page-reclaim@...gle.com>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        Brian Geffon <bgeffon@...gle.com>,
        Jan Alexander Steffens <heftig@...hlinux.org>,
        Oleksandr Natalenko <oleksandr@...alenko.name>,
        Steven Barrett <steven@...uorix.net>,
        Suleiman Souhlal <suleiman@...gle.com>,
        Daniel Byrne <djbyrne@....edu>,
        Donald Carr <d@...os-reins.com>,
        Holger Hoffstätte 
        <holger@...lied-asynchrony.com>,
        Konstantin Kharlamov <Hi-Angel@...dex.ru>,
        Shuang Zhai <szhai2@...rochester.edu>,
        Sofia Trinh <sofia.trinh@....works>,
        Vaibhav Jain <vaibhav@...ux.ibm.com>
Subject: Re: [PATCH v10 10/14] mm: multi-gen LRU: kill switch

On Tue, 26 Apr 2022 14:57:15 -0600 Yu Zhao <yuzhao@...gle.com> wrote:

> On Mon, Apr 11, 2022 at 8:16 PM Andrew Morton <akpm@...ux-foundation.org> wrote:
> >
> > On Wed,  6 Apr 2022 21:15:22 -0600 Yu Zhao <yuzhao@...gle.com> wrote:
> >
> > > Add /sys/kernel/mm/lru_gen/enabled as a kill switch. Components that
> > > can be disabled include:
> > >   0x0001: the multi-gen LRU core
> > >   0x0002: walking page table, when arch_has_hw_pte_young() returns
> > >           true
> > >   0x0004: clearing the accessed bit in non-leaf PMD entries, when
> > >           CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y
> > >   [yYnN]: apply to all the components above
> > > E.g.,
> > >   echo y >/sys/kernel/mm/lru_gen/enabled
> > >   cat /sys/kernel/mm/lru_gen/enabled
> > >   0x0007
> > >   echo 5 >/sys/kernel/mm/lru_gen/enabled
> > >   cat /sys/kernel/mm/lru_gen/enabled
> > >   0x0005
> >
> > I'm shocked that this actually works.  How does it work?  Existing
> > pages & folios are drained over time or synchrnously?
> 
> Basically we have a double-throw way, and once flipped, new (isolated)
> pages can only be added to the lists of the current implementation.
> Existing pages on the lists of the previous implementation are
> synchronously drained (isolated and then re-added), with
> cond_resched() of course.
> 
> > Supporting
> > structures remain allocated, available for reenablement?
> 
> Correct.
> 
> > Why is it thought necessary to have this?  Is it expected to be
> > permanent?
> 
> This is almost a must for large scale deployments/experiments.
> 
> For deployments, we need to keep fix rollout (high priority) and
> feature enabling (low priority) separate. Rolling out multiple
> binaries works but will make the process slower and more painful. So
> generally for each release, there is only one binary to roll out, and
> unless it's impossible, new features are disabled by default. Once a
> rollout completes, i.e., reaches enough population and remains stable,
> new features are turned on gradually. If something goes wrong with a
> new feature, we turn off that feature rather than roll back the
> kernel.
> 
> Similarly, for A/B experiments, we don't want to use two binaries.

Please let's spell out this sort of high-level thinking in the
changelogging.

>From what you're saying, this is a transient thing.  It sounds that
this enablement is only needed when mglru is at an early stage.  Once
it has matured more then successive rollouts will have essentially the
same mglru implementation and being able to disable mglru at runtime
will no longer be required?

I guess the capability is reasonable simple/small and is livable with,
but does it have a long-term future?

I mean, when organizations such as google start adopting the mglru
implementation which is present in Linus's tree we're, what, a year or
more into the future?  Will they still need a kill switch then?