lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YCOHZhJKr6DzFQGi@google.com>
Date:   Wed, 10 Feb 2021 00:12:38 -0700
From:   Yu Zhao <yuzhao@...gle.com>
To:     linux-mm@...ck.org
Cc:     Sonny Rao <sonnyrao@...gle.com>, Jann Horn <jannh@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        Jesse Barnes <jsbarnes@...gle.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        page-reclaim@...gle.com
Subject: Re: [page-reclaim] Augmented Page Reclaim

On Tue, Feb 09, 2021 at 01:32:58PM -0800, Jesse Barnes wrote:
> > ======================
> > Augmented Page Reclaim
> > ======================
> > We would like to share a work with you and see if there is enough
> > interest to warrant a run for the mainline. This work is a part of
> > result from a decade of research and experimentation in memory
> > overcommit at Google: an augmented page reclaim that, in our
> > experience, is performant, versatile and, more importantly, simple.
> 
> Per discussion on IRC, maybe some additional background would help.

And I'll add more details to the doc included in the tree once I've
finished collecting feedback.

> In looking at browser workloads on Chrome OS, we found that reclaim was:
> 1) too expensive in terms of CPU usage

We have two general metrics for this item: CPU time spent on page
reclaim and (direct) page reclaim latency. CPU usage is important to
everybody but latency is also quite important for phones, laptops,
etc.

> 2) often making poor decisions about what to reclaim

We have another two metrics here: the number of refaults and the
number of false inactive pages. For example, it's bad if pages refault
within a hundred of milliseconds after they have been reclaimed. Also
it's bad if reclaim thinks many pages are inactive but later finds
they are actually active.

> This work was mainly targeted toward improving those things, with an
> eye toward interactive performance for browser workloads.
> 
> We have a few key tests we use for that, that measure tab switch times
> and number of tab discards when under memory pressure, and this
> approach significantly improves these (see Yu's data).

We monitor workload-specific metrics as well. For example, we found
page reclaim also affects battery life, tab switch latency and the
number of janks (pauses when scrolling web pages) on Chrome OS. I
don't want to dump everything here because they seem irrelevant to
most people.

> We do expect this approach will also be beneficial to cloud workloads,
> and so are looking for people to try it out in their environments with
> their favorite key tests or workloads.

And if you are interested in our workload-specific metrics, Android,
Cloud, etc., please feel free to contact us. Any other questions,
concerns and suggestions are also welcome.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ