linux-kernel - Re: [PATCH v6 0/9] Multigenerational LRU Framework

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YdythmxHpSksJiXs@google.com>
Date:   Mon, 10 Jan 2022 15:04:54 -0700
From:   Yu Zhao <yuzhao@...gle.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Hillf Danton <hdanton@...a.com>, Jens Axboe <axboe@...nel.dk>,
        Jesse Barnes <jsbarnes@...gle.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Jonathan Corbet <corbet@....net>,
        Matthew Wilcox <willy@...radead.org>,
        Mel Gorman <mgorman@...e.de>,
        Michael Larabel <Michael@...haellarabel.com>,
        Rik van Riel <riel@...riel.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Will Deacon <will@...nel.org>,
        Ying Huang <ying.huang@...el.com>,
        linux-arm-kernel@...ts.infradead.org, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        page-reclaim@...gle.com, x86@...nel.org
Subject: Re: [PATCH v6 0/9] Multigenerational LRU Framework

On Mon, Jan 10, 2022 at 04:39:51PM +0100, Michal Hocko wrote:
> On Fri 07-01-22 11:45:40, Yu Zhao wrote:
> [...]
> > Next, I argue that the benefits of this patchset outweigh its risks,
> > because, drawing from my past experience,
> > 1. There have been many larger and/or riskier patchsets taken; I'll
> >    assemble a list if you disagree.
> 
> No question about that. Changes in the reclaim path are paved with
> failures and reverts and fine tuning on top of existing fine tuning.
> The difference from your patchset is that they tend to be much much
> smaller and go incremental and therefore easier to review.

No argument here.

> >    And this patchset is fully guarded
> >    by #ifdef; Linus has also assessed on this point.
> 
> I appreciate you made the new behavior an opt-in and therefore existing
> workloads are less likely to regress. I do not think ifdefs help
> all that much, though, because a) realistically the config will
> likely be enabled for most distribution kernels

There is also a runtime kill switch.

> b) the parallel
> reclaim implementation adds a maintenance overhead regardless of those
> ifdef. The later point is especially worrying because the memory reclaim
> is a complex and hard to review beast already. Any future changes would
> need to consider both reclaim algorithms of course.

A perfectly legitimate concern.

If this patchset is taken:
1. There will be refactoring that makes the long-term maintenance as
   affordable as possible, i.e., similar to the SL.B model, but can
   also make runtime switch.
2. There will also be optimizations for mmu notifier (KVM), THP, etc.
3. Most importantly, Google will be committing more resource on this.
   And that's why we need to hear a decision -- our resource planning
   depends on it.

> Hence I argue we really need a wider consensus this is the right
> direction we want to pursue.

We've been doing our best to get this consensus -- we invited all
the stakeholders to meetings a long time ago -- but unfortunately we
couldn't move the needle.

I agree consensus is important. But, IMO, progress is even more
important. And personally, I'd rather try something wrong than do
nothing.

> > 2. There have been none that came with the testing/benchmarking
> >    coverage as this one did. Please point me to some if I'm mistaken,
> >    and I'll gladly match them.
> 
> I do appreciate your numbers but you should realize that this is an area
> that is really hard to get any conclusive testing for.

Fully agreed. That's why we started a new initiative, and we hope more
people will following these practices:
1. All results in this area should be reported with at least standard
   deviations, or preferably confidence intervals.
2. Real applications should be benchmarked (with synthetic load
   generator), not just synthetic benchmarks (not real applications).
3. A wide range of devices should be covered, i.e., servers, desktops,
   laptops and phones.

I'm very confident to say our benchmark reports were hold to the
highest standards. We have worked with MariaDB (company), EnterpriseDB
(Postgres), Redis (company), etc. on these reports. They have copies
of these reports (PDF version):
https://linux-mm.googlesource.com/benchmarks/

We welcome any expert in those applications to examine our reports,
and we'll be happy to run any other benchmarks or same benchmarks with
different configurations that anybody thinks it's important and we've
missed.

> We keep learning
> about fallouts on workloads we haven't really anticipated or where the
> runtime effects happen to disagree with our intuition. So while those
> numbers are nice there are other important aspects to consider like the
> maintenance cost for example.

I assume we agree this is not an easy decision. Can I also assume we
agree that this decision should be make within a reasonable time frame?

> > The numbers might not materialize in the real world; the code is not
> > perfect; and many other risks... But all the top eight open source
> > memory hogs were covered, which is unprecedented; memcached and fio
> > showed significant improvements and it only takes a few commands to
> > see for yourselves.
> > 
> > Regarding the acks and the reviewed-bys, I certainly can ask people
> > who have reaped the benefits of this patchset to do them, if it's
> > required. But I see less fun in that. I prefer to provide empirical
> > evidence and convince people who are on the other side of the aisle.
> 
> I like to hear from users who benefit from your work and that certainly
> gives more credit to it. But it will be the MM community to maintain the
> code and address future issues.

I'll ask downstream kernel maintainers (from different distros) that
have taken this patchset to ACK.

I'll ask credible testers who are professionals, researchers,
contributors to other subsystems to provide Test-by's. There are many
other individual testers I may not be able to acknowledge their
efforts, e.g., my coworker just sent this to me:

   Using that v5 for some time and confirm that difference under heavy
   load and memory pressure is significant."
https://www.phoronix.com/forums/forum/software/general-linux-open-source/1301258-mglru-is-a-very-enticing-enhancement-for-linux-in-2022#post1301275

I'll leave the reviews in your capable hands. As I said, I prefer to
convince people with empirical evidence.

> We do not have a dedicated maintainer for the memory reclaim but
> certainly there are people who have helped shaping the existing code and
> have learned a lot from the past issues - like Johannes, Rik, Mel just
> to name few. If I were you I would be really looking into finding an
> agreement with them. I myself can help you with memcg and oom side of
> the things (we already have discussions about those).

Unfortunately people have different priorities. As I said, we tried
to get all the stakeholders in the same (conference) room so that we
can make some good progress. But we failed.

Rest assured, we'll keep trying. But please understand we need to do
cost control and therefore we can't keep investing in this effort
forever. So I think it's not unreasonable, after I've addressed all
pending comments, to ask for some clear instructions from the
leadership:
    Yes
    No
    Or something specific

Thanks!