[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YdiKVJlClB3h1Kmg@google.com>
Date: Fri, 7 Jan 2022 11:45:40 -0700
From: Yu Zhao <yuzhao@...gle.com>
To: Michal Hocko <mhocko@...e.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andi Kleen <ak@...ux.intel.com>,
Catalin Marinas <catalin.marinas@....com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Hillf Danton <hdanton@...a.com>, Jens Axboe <axboe@...nel.dk>,
Jesse Barnes <jsbarnes@...gle.com>,
Johannes Weiner <hannes@...xchg.org>,
Jonathan Corbet <corbet@....net>,
Matthew Wilcox <willy@...radead.org>,
Mel Gorman <mgorman@...e.de>,
Michael Larabel <Michael@...haellarabel.com>,
Rik van Riel <riel@...riel.com>,
Vlastimil Babka <vbabka@...e.cz>,
Will Deacon <will@...nel.org>,
Ying Huang <ying.huang@...el.com>,
linux-arm-kernel@...ts.infradead.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
page-reclaim@...gle.com, x86@...nel.org
Subject: Re: [PATCH v6 0/9] Multigenerational LRU Framework
On Fri, Jan 07, 2022 at 10:38:18AM +0100, Michal Hocko wrote:
> On Tue 04-01-22 13:30:00, Yu Zhao wrote:
> [...]
> > Hi Andrew, Linus,
> >
> > Can you please take a look at this patchset and let me know if it's
> > 5.17 material?
>
> I am still not done with the review and have seen at least few problems
> that would need to be addressed.
>
> But more fundamentally I believe there are really some important
> questions to be answered. First and foremost this is a major addition
> to the memory reclaim and there should be a wider consensus that we
> really want to go that way. The patchset doesn't have a single ack nor
> reviewed-by AFAICS. I haven't seen a lot of discussion since v2
> (http://lkml.kernel.org/r/20210413065633.2782273-1-yuzhao@google.com)
> nor do I see any clarification on how concerns raised there have been
> addressed or at least how they are planned to be addressed.
>
> Johannes has made some excellent points
> http://lkml.kernel.org/r/YHcpzZYD2fQyWvEQ@cmpxchg.org. Let me quote
> for reference part of it I find the most important:
> : Realistically, I think incremental changes are unavoidable to get this
> : merged upstream.
> :
> : Not just in the sense that they need to be smaller changes, but also
> : in the sense that they need to replace old code. It would be
> : impossible to maintain both, focus development and testing resources,
> : and provide a reasonably stable experience with both systems tugging
> : at a complicated shared code base.
> :
> : On the other hand, the existing code also has billions of hours of
> : production testing and tuning. We can't throw this all out overnight -
> : it needs to be surgical and the broader consequences of each step need
> : to be well understood.
> :
> : We also have millions of servers relying on being able to do upgrades
> : for drivers and fixes in other subsystems that we can't put on hold
> : until we stabilized a new reclaim implementation from scratch.
>
> Fully agreed on all points here.
>
> I do appreciate there is a lot of work behind this patchset and I
> also do understand it has gained a considerable amount of testing as
> well. Your numbers are impressive but my experience tells me that it is
> equally important to understand the worst case behavior and there is not
> really much mentioned about those in changelogs.
>
> We also shouldn't ignore costs the code is adding. One of them would be
> a further page flags depletion. We have been hitting problems on that
> front for years and many features had to be reworked to bypass a lack of
> space in page->flags.
>
> I will be looking more into the code (especially the memcg side of it)
> but I really believe that a consensus on above Johannes' points need to
> be found first before this work can move forward.
Thanks for the summary. I appreciate your time and I agree your
assessment is fair.
So I've acknowledged your concerns, and you've acknowledged my numbers
(the performance improvements) are impressive.
Now we are in agreement, cheers.
Next, I argue that the benefits of this patchset outweigh its risks,
because, drawing from my past experience,
1. There have been many larger and/or riskier patchsets taken; I'll
assemble a list if you disagree. And this patchset is fully guarded
by #ifdef; Linus has also assessed on this point.
2. There have been none that came with the testing/benchmarking
coverage as this one did. Please point me to some if I'm mistaken,
and I'll gladly match them.
The numbers might not materialize in the real world; the code is not
perfect; and many other risks... But all the top eight open source
memory hogs were covered, which is unprecedented; memcached and fio
showed significant improvements and it only takes a few commands to
see for yourselves.
Regarding the acks and the reviewed-bys, I certainly can ask people
who have reaped the benefits of this patchset to do them, if it's
required. But I see less fun in that. I prefer to provide empirical
evidence and convince people who are on the other side of the aisle.
Powered by blists - more mailing lists