linux-kernel - Re: [PATCH v9 05/14] mm: multi-gen LRU: groundwork

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOUHufbkSffDkUgv03kGdNs2-6V-fTHRKr7hjsZJubA_yWU7bQ@mail.gmail.com>
Date:   Mon, 21 Mar 2022 22:52:42 -0600
From:   Yu Zhao <yuzhao@...gle.com>
To:     Prarit Bhargava <prarit@...hat.com>,
        Justin Forbes <jforbes@...oraproject.org>
Cc:     Andi Kleen <ak@...ux.intel.com>, kernel-team@...ts.ubuntu.com,
        Vaibhav Jain <vaibhav@...ux.ibm.com>,
        Rik van Riel <riel@...riel.com>,
        Mel Gorman <mgorman@...e.de>,
        Catalin Marinas <catalin.marinas@....com>,
        Johannes Weiner <hannes@...xchg.org>,
        Aneesh Kumar <aneesh.kumar@...ux.ibm.com>,
        Brian Geffon <bgeffon@...gle.com>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
        Jesse Barnes <jsbarnes@...gle.com>,
        Sofia Trinh <sofia.trinh@....works>,
        "Huang, Ying" <ying.huang@...el.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Steven Barrett <steven@...uorix.net>,
        Shuang Zhai <szhai2@...rochester.edu>,
        Donald Carr <d@...os-reins.com>,
        Oleksandr Natalenko <oleksandr@...alenko.name>,
        Holger Hoffstätte <holger@...lied-asynchrony.com>,
        Will Deacon <will@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Jonathan Corbet <corbet@....net>,
        Mike Rapoport <rppt@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Jens Axboe <axboe@...nel.dk>, Hillf Danton <hdanton@...a.com>,
        Michal Hocko <mhocko@...nel.org>,
        kernel <kernel@...ts.fedoraproject.org>,
        Suleiman Souhlal <suleiman@...gle.com>,
        Daniel Byrne <djbyrne@....edu>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        Konstantin Kharlamov <Hi-Angel@...dex.ru>,
        Matthew Wilcox <willy@...radead.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Michael Larabel <Michael@...haellarabel.com>,
        Linux-MM <linux-mm@...ck.org>,
        Kernel Page Reclaim v2 <page-reclaim@...gle.com>,
        Jan Alexander Steffens <heftig@...hlinux.org>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH v9 05/14] mm: multi-gen LRU: groundwork

On Mon, Mar 21, 2022 at 1:18 PM Prarit Bhargava <prarit@...hat.com> wrote:
>
> On 3/21/22 14:58, Justin Forbes wrote:
> > On Mon, Mar 14, 2022 at 4:30 AM Yu Zhao <yuzhao@...gle.com> wrote:
> >>
> >> On Mon, Mar 14, 2022 at 2:09 AM Huang, Ying <ying.huang@...el.com> wrote:
> >>>
> >>> Hi, Yu,
> >>>
> >>> Yu Zhao <yuzhao@...gle.com> writes:
> >>>> diff --git a/mm/Kconfig b/mm/Kconfig
> >>>> index 3326ee3903f3..747ab1690bcf 100644
> >>>> --- a/mm/Kconfig
> >>>> +++ b/mm/Kconfig
> >>>> @@ -892,6 +892,16 @@ config ANON_VMA_NAME
> >>>>          area from being merged with adjacent virtual memory areas due to the
> >>>>          difference in their name.
> >>>>
> >>>> +# the multi-gen LRU {
> >>>> +config LRU_GEN
> >>>> +     bool "Multi-Gen LRU"
> >>>> +     depends on MMU
> >>>> +     # the following options can use up the spare bits in page flags
> >>>> +     depends on !MAXSMP && (64BIT || !SPARSEMEM || SPARSEMEM_VMEMMAP)
> >>>
> >>> LRU_GEN depends on !MAXSMP.  So, What is the maximum NR_CPUS supported
> >>> by LRU_GEN?
> >>
> >> LRU_GEN doesn't really care about NR_CPUS. IOW, it doesn't impose a
> >> max number. The dependency is with NODES_SHIFT selected by MAXSMP:
> >>      default "10" if MAXSMP
> >> This combined with LAST_CPUPID_SHIFT can exhaust the spare bits in page flags.
> >>
> >> MAXSMP is meant for kernel developers to test their code, and it
> >> should not be used in production [1]. But some distros unfortunately
> >> ship kernels built with this option, e.g., Fedora and Ubuntu. And
> >> their users reported build errors to me after they applied MGLRU on
> >> those kernels ("Not enough bits in page flags"). Let me add Fedora and
> >> Ubuntu to this thread.
> >>
> >> Fedora and Ubuntu,
> >>
> >> Could you please clarify if there is a reason to ship kernels built
> >> with MAXSMP? Otherwise, please consider disabling this option. Thanks.
> >>
> >> As per above, MAXSMP enables ridiculously large numbers of CPUs and
> >> NUMA nodes for testing purposes. It is detrimental to performance,
> >> e.g., CPUMASK_OFFSTACK.
> >
> > It was enabled for Fedora, and RHEL because we did need more than 512
> > CPUs, originally only in RHEL until SGI (years ago) complained that
> > they were testing very large machines with Fedora.  The testing done
> > on RHEL showed that the performance impact was minimal.   For a very
> > long time we had MAXSMP off and carried a patch which allowed us to
> > turn on CPUMASK_OFFSTACK without debugging because there was supposed
> > to be "something else" coming.  In 2019 we gave up, dropped that patch
> > and just turned on MAXSMP.
> >
> > I do not have any metrics for how often someone runs Fedora on a
> > ridiculously large machine these days, but I would guess that number
> > is not 0.
>
> It is not 0.  I've seen data from large systems (1000+ logical threads)
> that are running Fedora albeit with a modified Fedora kernel.
>
> Additionally the max limit for CPUS in RHEL is 1792, however, we have
> recently had a request to *double* that to 3584.  You should just assume
> that number will continue to increase.

Good to know. Thanks.

>From the standpoint of overhead, I'd consider NR_CPUS=4096 and
NODES_SHIFT=7 as the next step, before going with MAXSMP.