lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4w5GM5r916XEz+gj=33A+b98kyJONLNpEnBMmX5XnPRmg@mail.gmail.com>
Date:   Fri, 28 Jan 2022 21:54:09 +1300
From:   Barry Song <21cnbao@...il.com>
To:     Yu Zhao <yuzhao@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Hillf Danton <hdanton@...a.com>, Jens Axboe <axboe@...nel.dk>,
        Jesse Barnes <jsbarnes@...gle.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Jonathan Corbet <corbet@....net>,
        Matthew Wilcox <willy@...radead.org>,
        Mel Gorman <mgorman@...e.de>,
        Michael Larabel <Michael@...haellarabel.com>,
        Michal Hocko <mhocko@...nel.org>,
        Rik van Riel <riel@...riel.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Will Deacon <will@...nel.org>,
        Ying Huang <ying.huang@...el.com>,
        LAK <linux-arm-kernel@...ts.infradead.org>,
        Linux Doc Mailing List <linux-doc@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>, page-reclaim@...gle.com,
        x86 <x86@...nel.org>
Subject: Re: [PATCH v6 0/9] Multigenerational LRU Framework

On Tue, Jan 25, 2022 at 7:48 PM Yu Zhao <yuzhao@...gle.com> wrote:
>
> On Sun, Jan 23, 2022 at 06:43:06PM +1300, Barry Song wrote:
> > On Wed, Jan 5, 2022 at 7:17 PM Yu Zhao <yuzhao@...gle.com> wrote:
>
> <snipped>
>
> > > Large-scale deployments
> > > -----------------------
> > > We've rolled out MGLRU to tens of millions of Chrome OS users and
> > > about a million Android users. Google's fleetwide profiling [13] shows
> > > an overall 40% decrease in kswapd CPU usage, in addition to
> >
> > Hi Yu,
> >
> > Was the overall 40% decrease of kswap CPU usgae seen on x86 or arm64?
> > And I am curious how much we are taking advantage of NONLEAF_PMD_YOUNG.
> > Does it help a lot in decreasing the cpu usage?
>
> Hi Barry,
>
> The fleet-wide profiling data I shared was from x86. For arm64, I only
> have data from synthetic benchmarks at the moment, and it also shows
> similar improvements.
>
> For Chrome OS (individual users), walk_pte_range(), the function that
> would benefit from ARCH_HAS_NONLEAF_PMD_YOUNG, only uses a small
> portion (<4%) of kswapd CPU time. So ARCH_HAS_NONLEAF_PMD_YOUNG isn't
> that helpful.

Hi Yu,
Thanks!

In the current kernel, depending on reverse mapping, while memory is
under pressure,
the cpu usage of kswapd can be very very high especially while a lot of pages
have large mapcount, thus a huge reverse mapping cost.

Regarding  <4%, I guess the figure came from machines with NONLEAF_PMD_YOUNG?
In this case, we can skip many PTE scans while PMD has no accessed bit
set. But for
a machine without NONLEAF, will the figure of cpu usage be much larger?

>
> > If so, this might be
> > a good proof that arm64 also needs this hardware feature?
> > In short, I am curious how much the improvement in this patchset depends
> > on the hardware ability of NONLEAF_PMD_YOUNG.
>
> For data centers, I do think ARCH_HAS_NONLEAF_PMD_YOUNG has some value.
> In addition to cold/hot memory scanning, there are other use cases like
> dirty tracking, which can benefit from the accessed bit on non-leaf
> entries. I know some proprietary software uses this capability on x86
> for different purposes than this patchset does. And AFAIK, x86 is the
> only arch that supports this capability, e.g., risc-v and ppc can only
> set the accessed bit in PTEs.

Yep. NONLEAF is a nice feature.

btw, page table should have a separate DIRTY bit, right? wouldn't dirty page
tracking depend on the DIRTY bit rather than the accessed bit? so x86 also has
NONLEAF dirty bit? Or they are scanning accessed bit of PMD before
scanning DIRTY bits of PTEs?

>
> In fact, I've discussed this with one of the arm maintainers Will. So
> please check with him too if you are interested in moving forward with
> the idea. I might be able to provide with additional data if you need
> it to make a decision.

I am interested in running it and have some data without NONLEAF
especially while free memory is very limited and the system has memory
thrashing.

>
> Thanks.

Thanks
Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ