[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250814134826.154003-1-bharata@amd.com>
Date: Thu, 14 Aug 2025 19:18:19 +0530
From: Bharata B Rao <bharata@....com>
To: <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>
CC: <Jonathan.Cameron@...wei.com>, <dave.hansen@...el.com>,
<gourry@...rry.net>, <hannes@...xchg.org>, <mgorman@...hsingularity.net>,
<mingo@...hat.com>, <peterz@...radead.org>, <raghavendra.kt@....com>,
<riel@...riel.com>, <rientjes@...gle.com>, <sj@...nel.org>,
<weixugc@...gle.com>, <willy@...radead.org>, <ying.huang@...ux.alibaba.com>,
<ziy@...dia.com>, <dave@...olabs.net>, <nifan.cxl@...il.com>,
<xuezhengchu@...wei.com>, <yiannis@...corp.com>, <akpm@...ux-foundation.org>,
<david@...hat.com>, <byungchul@...com>, <kinseyho@...gle.com>,
<joshua.hahnjy@...il.com>, <yuanchu@...gle.com>, <balbirs@...dia.com>,
Bharata B Rao <bharata@....com>
Subject: [RFC PATCH v1 0/7] A subsystem for hot page detection and promotion
Hi,
This patchset is about adding a dedicated sub-system for maintaining
hot pages information from the lower tiers and promoting the hot pages
to the top tiers. It exposes an API that other sub-systems which detect
accesses, can use to report the accesses for further processing. Further
processing includes system-wide accumulation of memory access info at
PFN granularity, classification the PFNs as hot and promotion of hot
pages using per-node kernel threads. This is a continuation of the
earlier kpromoted work [1] that I posted a while back.
Kernel thread based async batch migration [2] was an off-shoot of
this effort that attempted to batch the migrations from NUMA
balancing by creating a separate kernel thread for migration.
Per-page hotness information was stored as part of extended page
flags. The kernel thread then scanned the entire PFN space to pick
the PFNs that are classified as hot.
The observed challenges from the previous approaches were these:
1. Too many PFNs need to be scanned to identify the hot PFNs in
approach [2].
2. Hot page records stored in hash lists become unwieldy for
extracting the required hot pages in approach [1].
3. Dynamic allocation vs static availability of space to store
per-page hotness information.
This series tries to address challenges 1 and 2 by maintaining
the hot page records in hash lists for quick lookup and maintaining
a separate per-target-node max heap for storing ready-to-migrate
hot page records. The records in heap are priority-ordered based
on "hotness" of the page.
The API for reporting the page access remains unchanged from [1].
When the page access gets recorded, the hotness data of the page
is updated and if it crosses a threshold, it gets tracked in the
heap as well. These heaps are per-target-node and corresponding
migrate threads will periodically extract the top records from
them and do batch migration.
In the current series, two page temperature sources are included
as examples.
1. IBS based memory access profiler.
2. PTE-A bit based access profiler for MGLRU. (from Kinsey Ho)
TODOs:
- Currently only access frequency is used to calculate the hotness.
We could have a scalar hotness indicator based on both frequency
of access and time of access.
- There could be millions of allocation and freeing of records
and from atomic contexts too. Need to understand how problematic
this could be. Approach [2] mitigated this by having pre-allocated
hotness records for each page as part of extended page flags.
- The amount of data needed for tracking hotness is also a concern.
There is scope for packing the three parameters (nid, time, frequency)
in a more compact manner which I will attempt in next iterations.
- Migration rate-limiting needs to be added.
- Very very lightly tested atm as the current focus is to get the
hot data arragement right.
Regards,
Bharata.
[1] Kpromoted - https://lore.kernel.org/linux-mm/20250306054532.221138-1-bharata@amd.com/
[2] Kmigrated - https://lore.kernel.org/linux-mm/20250616133931.206626-1-bharata@amd.com/
Bharata B Rao (4):
mm: migrate: Allow misplaced migration without VMA too
mm: Hot page tracking and promotion
x86: ibs: In-kernel IBS driver for memory access profiling
x86: ibs: Enable IBS profiling for memory accesses
Gregory Price (1):
migrate: implement migrate_misplaced_folios_batch
Kinsey Ho (2):
mm: mglru: generalize page table walk
mm: klruscand: use mglru scanning for page promotion
arch/x86/events/amd/ibs.c | 11 +
arch/x86/include/asm/entry-common.h | 3 +
arch/x86/include/asm/hardirq.h | 2 +
arch/x86/include/asm/ibs.h | 9 +
arch/x86/include/asm/msr-index.h | 16 +
arch/x86/mm/Makefile | 3 +-
arch/x86/mm/ibs.c | 343 +++++++++++++++++++
include/linux/migrate.h | 6 +
include/linux/mmzone.h | 16 +
include/linux/pghot.h | 87 +++++
include/linux/vm_event_item.h | 26 ++
mm/Kconfig | 19 ++
mm/Makefile | 2 +
mm/internal.h | 4 +
mm/klruscand.c | 118 +++++++
mm/migrate.c | 36 +-
mm/mm_init.c | 10 +
mm/pghot.c | 501 ++++++++++++++++++++++++++++
mm/vmscan.c | 176 +++++++---
mm/vmstat.c | 26 ++
20 files changed, 1365 insertions(+), 49 deletions(-)
create mode 100644 arch/x86/include/asm/ibs.h
create mode 100644 arch/x86/mm/ibs.c
create mode 100644 include/linux/pghot.h
create mode 100644 mm/klruscand.c
create mode 100644 mm/pghot.c
--
2.34.1
Powered by blists - more mailing lists