[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z4XUoWlU-UgRik18@gourry-fedora-PF4VCD3F>
Date: Mon, 13 Jan 2025 22:06:09 -0500
From: Gregory Price <gourry@...rry.net>
To: SeongJae Park <sj@...nel.org>
Cc: lsf-pc@...ts.linux-foundation.org, damon@...ts.linux.dev,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
kernel-team@...a.com, Raghavendra K T <raghavendra.kt@....com>,
Yuanchu Xie <yuanchu@...gle.com>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>,
Kaiyang Zhao <kaiyang2@...cmu.edu>,
Jiaming Yan <jiamingy@...zon.com>, Honggyu Kim <honggyu.kim@...com>
Subject: Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of
Future
On Wed, Jan 01, 2025 at 02:20:39PM -0800, SeongJae Park wrote:
> Hi all,
>
>
> I find a few interesting and promising projects that aim to do efficient access
> pattern-aware memory management of near future, including below (alphabetically
> sorted).
>
> - Promotion of unmapped page cache folios
> (https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net)
I'll break down a few observations I made while hacking on unmapped
page cache promotion - and my concerns for a leveraging DAMON here.
Additionally some other concerns I've seen raised about duplicating
promotion logic across various kernel components.
Latest RFC:
https://lore.kernel.org/linux-mm/20250107000346.1338481-1-gourry@gourry.net/
Basic Premise:
Use folio_mark_accessed() as a measure of hotness for promotion.
Defer promotion to task_work due to locking complexities.
My major concerns / lessons learned from this exercise include:
1) The cost of checking promotion candidacy can be problematic
In my microbenchmark in the last RFC version, I showed that while
the performance upside (~22-25%) is substantial, there was a
non-trivial cost associated with injecting even a single global
boolean check in the file_read() path. This was unexpected.
I can probably optimize the disabled case with a likely() clause,
but I did not expect such sensitivity. This tells me injecting
an unconditional call into DAMON may be too much overhead.
I would need to explore this further - including whether it is
feasible to inject such a large dependency into swap.c
This may not affect all cases, but it does affect at least this one.
2) The complexity of "when it is safe" to promote a folio is subtle
at best, and "actively hostile" at worst.
I learned in v1 of the RFC that promotion inline with fma() is not
feasible due to a few contexts (task dying in particular) in which
migration is not safe. I deferred to task work because I noticed
prior attempts (in development notes) had seen similar issues.
Adding a folio reference and/or page flag to defer that migration to
another context (i.g. async kthread) solves this at the expensive of
implementation complexity. (leaked folios if done wrong)
I'd have to look at whether it's worth the increased complexity to
aggregate this (particular) identification mechanism - but I think
there is clear value to aggregating promotion.
I could see some value in pumping tracking bits into DAMON - but I
also see value is making tasks handle promotion as a form of fairness.
3) There were expressed opinions on runtime fairness WRT to promotion.
There's two competing thoughts:
A) Making accessing tasks eat inline promotion cost captures that
cost in their runtime slice, promoting fairness in scheduling.
B) Aggregating promotion to an external thread can reduce inline
faults and tail latencies, but may hides per-task cost. This
is a concern if one task drives all the promotions, effectingly
stealing an entire core by nature of the async design.
I don't have a good answer to this, just an observation that charging
promotion time to the identifying task was a concern that was raised.
4) TPP and Unmapped Page Promotion may affect each other.
There is a rate-limiting mechanism in the migration path that was
intended to prevent over-pressuring bandwidth with aggressive
migrations - prevent major memory stalls.
By adding more pressure on this limit from an additional source,
we're obviously increasing the time it takes to converge.
This is probably the greatest argument for creating a new, aggregated
promotion mechanism to serve all of these identification mechanism.
This would make it easier for us to determine whether/what
identification mechanisms can be aggregated while enabling forward
progress on each of them separately.
5) Scarce resources
We need to be careful not to consume excessive amounts of resources
in an attempt to track all these identifying mechanisms. Even 1 byte
per folio is 256MB on a 1TB machine. This gets out of hand quick.
With task-work, I was able to add no additional resource consumption,
but deferring to a fully async scenario and needing to track things
like last-accessing CPU, timestamps, and etc.
We'll need to examine this closely if we decide to aggregate either
of these mechanisms.
~Gregory
Powered by blists - more mailing lists