[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220105085534.22981-1-sj@kernel.org>
Date: Wed, 5 Jan 2022 08:55:34 +0000
From: SeongJae Park <sj@...nel.org>
To: Yu Zhao <yuzhao@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andi Kleen <ak@...ux.intel.com>,
Catalin Marinas <catalin.marinas@....com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Hillf Danton <hdanton@...a.com>, Jens Axboe <axboe@...nel.dk>,
Jesse Barnes <jsbarnes@...gle.com>,
Johannes Weiner <hannes@...xchg.org>,
Jonathan Corbet <corbet@....net>,
Matthew Wilcox <willy@...radead.org>,
Mel Gorman <mgorman@...e.de>,
Michael Larabel <Michael@...haellarabel.com>,
Michal Hocko <mhocko@...nel.org>,
Rik van Riel <riel@...riel.com>,
Vlastimil Babka <vbabka@...e.cz>,
Will Deacon <will@...nel.org>,
Ying Huang <ying.huang@...el.com>,
linux-arm-kernel@...ts.infradead.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
page-reclaim@...gle.com, x86@...nel.org
Subject: Re: [PATCH v6 0/9] Multigenerational LRU Framework
Hi Yu,
On Tue, 4 Jan 2022 13:22:19 -0700 Yu Zhao <yuzhao@...gle.com> wrote:
> TLDR
> ====
> The current page reclaim is too expensive in terms of CPU usage and it
> often makes poor choices about what to evict. This patchset offers an
> alternative solution that is performant, versatile and
> straightforward.
>
[...]
> Summery
> =======
> The facts are:
> 1. The independent lab results and the real-world applications
> indicate substantial improvements; there are no known regressions.
So impressive results!
> 2. Thrashing prevention, working set estimation and proactive reclaim
> work out of the box; there are no equivalent solutions.
I think similar works are already available out of the box with the latest
mainline tree, though it might be suboptimal in some cases.
First, you can do thrashing prevention using DAMON-based Operation Scheme
(DAMOS)[1] with MADV_COLD action. Second, for working set estimation, you can
either use the DAMOS again with statistics action, or the damon_aggregated
tracepoint[2]. The DAMON user space tool[3] helps the tracepoint analysis and
visualization. Finally, for the proactive reclaim, you can again use the DAMOS
with MADV_PAGEOUT action, or simply the DAMON-based proactive reclaim
module (DAMON_RECLAIM)[4].
Nevertheless, as noted above, current DAMON based solutions might be suboptimal
for some cases. First of all, DAMON currently doesn't provide page granularity
monitoring. Though its monitoring results were useful for our users'
production usages, there could be different requirements and situations.
Secondly, the DAMON-based thrashing prevention wouldn't reduce the CPU usage of
the reclamation logic's access scanning.
So, to me, MGLRU patchset looks providing something that DAMON doesn't provide,
but also something that DAMON is already providing. Specifically, the
efficient page granularity access scanning is what DAMON doesn't provide for
now. However, the utilization of the access information for LRU list
manipulation (thrashing prevention) and proactive reclamation is similar to
what DAMON (specifically, DAMOS) provides. Also, this patchset is reducing the
reclamation logic's CPU usage using the efficient page granularity access
scanning.
IMHO, we might be able to reduce the duplicates by integrating MGLRU in DAMON.
What I'm saying is, we could 1) introduce the efficient page granularity access
scanning, 2) reduce the reclamation logic's CPU usage by making it to use the
efficient page granularity access scanning, and 3) extend DAMON for page
granularity monitoring with the efficient access sacanning[5]. Then, users
could get the benefit of MGLRU by using DAMOS but setting it to use your
efficient page granularity access scanning. To make it more simple, we can
extend existing kernel logics to use DAMON in the way, or implement a new
kernel module. Additional advantages of this approach would be 1) reducing the
changes to the existing code, and 2) making the efficient page granularity
access information be utilized for more general cases.
Of course, the integration might not be so simple as seems to me now. We could
put DAMON and MGLRU together as those are for now, and let users select what
they really want. I think it's up to you.
I didn't read this patchset thoroughly yet, so I might missing many things. If
so, please feel free to let me know.
[1] https://docs.kernel.org/admin-guide/mm/damon/usage.html#schemes
[2] https://docs.kernel.org/admin-guide/mm/damon/usage.html#tracepoint-for-monitoring-results
[3] https://github.com/awslabs/damo
[4] https://docs.kernel.org/admin-guide/mm/damon/reclaim.html
[5] https://docs.kernel.org/vm/damon/design.html#configurable-layers
Thanks,
SJ
[...]
Powered by blists - more mailing lists