linux-kernel - Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z4XUoWlU-UgRik18@gourry-fedora-PF4VCD3F>
Date: Mon, 13 Jan 2025 22:06:09 -0500
From: Gregory Price <gourry@...rry.net>
To: SeongJae Park <sj@...nel.org>
Cc: lsf-pc@...ts.linux-foundation.org, damon@...ts.linux.dev,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	kernel-team@...a.com, Raghavendra K T <raghavendra.kt@....com>,
	Yuanchu Xie <yuanchu@...gle.com>,
	Jonathan Cameron <Jonathan.Cameron@...wei.com>,
	Kaiyang Zhao <kaiyang2@...cmu.edu>,
	Jiaming Yan <jiamingy@...zon.com>, Honggyu Kim <honggyu.kim@...com>
Subject: Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of
 Future

On Wed, Jan 01, 2025 at 02:20:39PM -0800, SeongJae Park wrote:
> Hi all,
> 
> 
> I find a few interesting and promising projects that aim to do efficient access
> pattern-aware memory management of near future, including below (alphabetically
> sorted).
> 
> - Promotion of unmapped page cache folios
>   (https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net)

I'll break down a few observations I made while hacking on unmapped
page cache promotion - and my concerns for a leveraging DAMON here.

Additionally some other concerns I've seen raised about duplicating
promotion logic across various kernel components.

Latest RFC:
https://lore.kernel.org/linux-mm/20250107000346.1338481-1-gourry@gourry.net/

Basic Premise:
   Use folio_mark_accessed() as a measure of hotness for promotion.
   Defer promotion to task_work due to locking complexities.

My major concerns / lessons learned from this exercise include:

1) The cost of checking promotion candidacy can be problematic

   In my microbenchmark in the last RFC version, I showed that while
   the performance upside (~22-25%) is substantial, there was a
   non-trivial cost associated with injecting even a single global
   boolean check in the file_read() path.  This was unexpected.

   I can probably optimize the disabled case with a likely() clause,
   but I did not expect such sensitivity.  This tells me injecting
   an unconditional call into DAMON may be too much overhead. 

   I would need to explore this further - including whether it is
   feasible to inject such a large dependency into swap.c

   This may not affect all cases, but it does affect at least this one.

2) The complexity of "when it is safe" to promote a folio is subtle
   at best, and "actively hostile" at worst.

   I learned in v1 of the RFC that promotion inline with fma() is not
   feasible due to a few contexts (task dying in particular) in which
   migration is not safe.  I deferred to task work because I noticed
   prior attempts (in development notes) had seen similar issues.

   Adding a folio reference and/or page flag to defer that migration to
   another context (i.g. async kthread) solves this at the expensive of
   implementation complexity. (leaked folios if done wrong)

   I'd have to look at whether it's worth the increased complexity to
   aggregate this (particular) identification mechanism - but I think
   there is clear value to aggregating promotion.

   I could see some value in pumping tracking bits into DAMON - but I
   also see value is making tasks handle promotion as a form of fairness.

3) There were expressed opinions on runtime fairness WRT to promotion.

   There's two competing thoughts:
   A) Making accessing tasks eat inline promotion cost captures that
      cost in their runtime slice, promoting fairness in scheduling.

   B) Aggregating promotion to an external thread can reduce inline
      faults and tail latencies, but may hides per-task cost. This
      is a concern if one task drives all the promotions, effectingly
      stealing an entire core by nature of the async design.

   I don't have a good answer to this, just an observation that charging
   promotion time to the identifying task was a concern that was raised.

4) TPP and Unmapped Page Promotion may affect each other.

   There is a rate-limiting mechanism in the migration path that was
   intended to prevent over-pressuring bandwidth with aggressive
   migrations - prevent major memory stalls.

   By adding more pressure on this limit from an additional source,
   we're obviously increasing the time it takes to converge.

   This is probably the greatest argument for creating a new, aggregated
   promotion mechanism to serve all of these identification mechanism.

   This would make it easier for us to determine whether/what
   identification mechanisms can be aggregated while enabling forward
   progress on each of them separately.

5) Scarce resources

   We need to be careful not to consume excessive amounts of resources
   in an attempt to track all these identifying mechanisms.  Even 1 byte
   per folio is 256MB on a 1TB machine.  This gets out of hand quick.

   With task-work, I was able to add no additional resource consumption,
   but deferring to a fully async scenario and needing to track things
   like last-accessing CPU, timestamps, and etc.

   We'll need to examine this closely if we decide to aggregate either
   of these mechanisms.

~Gregory