linux-kernel - Re: [LSF/MM/BPF TOPIC] Overhauling hot page detection and promotion based on PTE A bit scanning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250123182050.53941-1-sj@kernel.org>
Date: Thu, 23 Jan 2025 10:20:50 -0800
From: SeongJae Park <sj@...nel.org>
To: Raghavendra K T <raghavendra.kt@....com>
Cc: SeongJae Park <sj@...nel.org>,
	linux-mm@...ck.org,
	akpm@...ux-foundation.org,
	lsf-pc@...ts.linux-foundation.org,
	bharata@....com,
	gourry@...rry.net,
	nehagholkar@...a.com,
	abhishekd@...a.com,
	ying.huang@...ux.alibaba.com,
	nphamcs@...il.com,
	hannes@...xchg.org,
	feng.tang@...el.com,
	kbusch@...a.com,
	Hasan.Maruf@....com,
	david@...hat.com,
	willy@...radead.org,
	k.shutemov@...il.com,
	mgorman@...hsingularity.net,
	vbabka@...e.cz,
	hughd@...gle.com,
	rientjes@...gle.com,
	shy828301@...il.com,
	liam.howlett@...cle.com,
	peterz@...radead.org,
	mingo@...hat.com,
	nadav.amit@...il.com,
	shivankg@....com,
	ziy@...dia.com,
	jhubbard@...dia.com,
	AneeshKumar.KizhakeVeetil@....com,
	linux-kernel@...r.kernel.org,
	jon.grimm@....com,
	santosh.shukla@....com,
	Michael.Day@....com,
	riel@...riel.com,
	weixugc@...gle.com,
	leesuyeon0506@...il.com,
	honggyu.kim@...com,
	leillc@...gle.com,
	kmanaouil.dev@...il.com,
	rppt@...nel.org,
	dave.hansen@...el.com
Subject: Re: [LSF/MM/BPF TOPIC] Overhauling hot page detection and promotion based on PTE A bit scanning

Hi Raghavendra,

On Thu, 23 Jan 2025 10:57:21 +0000 Raghavendra K T <raghavendra.kt@....com> wrote:

> Bharata and I would like to propose the following topic for LSFMM.
> 
> Topic: Overhauling hot page detection and promotion based on PTE A bit scanning.

Thank you for proposing this.  I'm interested in this!

>  
> In the Linux kernel, hot page information can potentially be obtained from
> multiple sources:
>  
> a. PROT_NONE faults (NUMA balancing)
> b. PTE Access bit (LRU scanning)
> c. Hardware provided page hotness info (like AMD IBS)
>  
> This information is further used to migrate (or promote) pages from slow memory
> tier to top tier to increase performance.
> 
> In the current hot page promotion mechanism, all the activities including the
> process address space scanning, NUMA hint fault handling and page migration are
> performed in the process context. i.e., scanning overhead is borne by the
> applications.

I understand that you're mentioning about only fully in-kernel solutions.  Just
for readers' context, SK hynix' HMSDK cpacity expansion[1] does the works in
two asynchronous threads (one for promotion and the other for demotion), using
DAMON in kernel as the core worker, and controlling DAMON from the user-space.

>  
> I had recently posted a patch [1] to improve this in the context of slow-tier
> page promotion. Here, Scanning is done by a global kernel thread which routinely
> scans all the processes' address spaces and checks for accesses by reading the
> PTE A bit. The hot pages thus identified are maintained in list and subsequently
> are promoted to a default top-tier node. Thus, the approach pushes overhead of
> scanning, NUMA hint faults and migrations off from process context.
> 
> The topic was presented in the MM alignment session hosted by David Rientjes [2].
> The topic also finds a mention in S J Park's LSFMM proposal [3].
>  
> Here is the list of potential discussion points:

Great discussion points, thank you.  I'm adding how DAMON tries to deal with
some of the points below.

> 1. Other improvements and enhancements to PTE A bit scanning approach. Use of
> multiple kernel threads,

DAMON allows use of multiple kernel threads for different monitoring scopes.
There were also ideas for splitting the monitoring part and migration-like
system operation part to different threads.

> throttling improvements,

DAMON provides features called "adaptive regions adjustment" and "DAMOS quotas"
for throttling overheads from access monitoring and migration-like system
operation actions.

> promotion policies,

DAMON's access-aware system operation feature (DAMOS) allows setting this kind
of system operation policy based on access pattern and additional information
including page level information such as anonymousness, belonging cgroup, page
granular A bit recheck.

> per-process opt-in via prctl,

DAMON allows making the system operation action to pages belonging to specific
cgroups using a feature called DAMOS filters.  It is not integrated with prctl,
and would work in cgroups scope, but may be able to be used.  Extending DAMOS
filters for belonging processes may also be doable.

> virtual vs physical address based scanning,

DAMON supports both virtual and physical address spaces monitoring.  DAMON's
pages migration is currently not supported for virtual address spaces, though I
believe adding the support is not difficult.

I'm bit in favor or physical address space, probably because I'm biased to what
DAMON currently supports, but also due to unmapped pages promotion like edge
cases.

> tuning hot page detection algorithm etc.

DAMON requires users manually tuning some important paramters for hot pages
detection.  We recently provided a tuning guide[2], and working on making it
automated.  I believe the essential problem is similar to many use cases
regardless of the type of low level access check primitives, so want to learn
if the tuning automation idea can be generally used.

> 
> 2. Possibility of maintaining single source of truth for page hotness that would
> maintain hot page information from multiple sources and let other sub-systems
> use that info.

DAMON is currently using the PTE A bit as the essential access check primitive.
We designed DAMON to be able to be extended for other access check primitives
such as page faults and AMD IBS like h/w features.  We are now planning to do
such extension, though still in the very early low-priority planning stage.
DAMON also provides the kernel API.

> 
> 3. Discuss how hardware provided hotness info (like AMD IBS) can further aid
> promotion. Bharata had posted an RFC [4] on this a while back.

Maybe CXL Hotness Monitoring Unit could also be an interesting thing to discuss
together.

> 
> 4. Overlap with DAMON and potential reuse.

I confess that it seems some of the works might overlap with DAMON to my biased
eyes.  I'm looking forward to attend this session, to make it less biased and
more aligned with people :)

>  
> Links:
> 
> [1] https://lore.kernel.org/all/20241201153818.2633616-1-raghavendra.kt@amd.com/
> [2] https://lore.kernel.org/linux-mm/20241226012833.rmmbkws4wdhzdht6@ed.ac.uk/T/
> [3] https://lore.kernel.org/lkml/Z4XUoWlU-UgRik18@gourry-fedora-PF4VCD3F/T/
> [4] https://lore.kernel.org/lkml/20230208073533.715-2-bharata@amd.com/

Again, thank you for proposing this topic, and I wish to see you at Montreal!


[1] https://github.com/skhynix/hmsdk/wiki/Capacity-Expansion
[2] https://lkml.kernel.org/r/20250110185232.54907-1-sj@kernel.org


Thanks,
SJ