linux-kernel - Re: [RFC PATCH v1 0/7] A subsystem for hot page detection and promotion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <14359326-bdc2-4d9a-b243-b5ffcad0716b@amd.com>
Date: Fri, 15 Aug 2025 21:05:32 +0530
From: Bharata B Rao <bharata@....com>
To: Balbir Singh <balbirs@...dia.com>, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org
Cc: Jonathan.Cameron@...wei.com, dave.hansen@...el.com, gourry@...rry.net,
 hannes@...xchg.org, mgorman@...hsingularity.net, mingo@...hat.com,
 peterz@...radead.org, raghavendra.kt@....com, riel@...riel.com,
 rientjes@...gle.com, sj@...nel.org, weixugc@...gle.com, willy@...radead.org,
 ying.huang@...ux.alibaba.com, ziy@...dia.com, dave@...olabs.net,
 nifan.cxl@...il.com, xuezhengchu@...wei.com, yiannis@...corp.com,
 akpm@...ux-foundation.org, david@...hat.com, byungchul@...com,
 kinseyho@...gle.com, joshua.hahnjy@...il.com, yuanchu@...gle.com
Subject: Re: [RFC PATCH v1 0/7] A subsystem for hot page detection and
 promotion

On 15-Aug-25 5:29 PM, Balbir Singh wrote:
> On 8/14/25 23:48, Bharata B Rao wrote:
>> Hi,
>>
>> This patchset is about adding a dedicated sub-system for maintaining
>> hot pages information from the lower tiers and promoting the hot pages
>> to the top tiers. It exposes an API that other sub-systems which detect
>> accesses, can use to report the accesses for further processing. Further
>> processing includes system-wide accumulation of memory access info at
>> PFN granularity, classification the PFNs as hot and promotion of hot
>> pages using per-node kernel threads. This is a continuation of the
>> earlier kpromoted work [1] that I posted a while back.
>>
>> Kernel thread based async batch migration [2] was an off-shoot of
>> this effort that attempted to batch the migrations from NUMA
>> balancing by creating a separate kernel thread for migration.
>> Per-page hotness information was stored as part of extended page
>> flags. The kernel thread then scanned the entire PFN space to pick
>> the PFNs that are classified as hot.
>>
>> The observed challenges from the previous approaches were these:
>>
>> 1. Too many PFNs need to be scanned to identify the hot PFNs in
>>    approach [2].
>> 2. Hot page records stored in hash lists become unwieldy for
>>    extracting the required hot pages in approach [1].
>> 3. Dynamic allocation vs static availability of space to store
>>    per-page hotness information.
>>
>> This series tries to address challenges 1 and 2 by maintaining
>> the hot page records in hash lists for quick lookup and maintaining
>> a separate per-target-node max heap for storing ready-to-migrate
>> hot page records. The records in heap are priority-ordered based
>> on "hotness" of the page.
>>
> 
> Could you elaborate on when/how a page is considered hot? Is it based
> on how often a page has been scanned?

There are multiple sub-systems within the kernel which detect and
act upon page accesses. NUMA balancing (via hint faults), MGLRU (via
page table scanning for PTE A bit) are examples of the same. The
idea behind this patchset is to consolidate such access information
within a new dedicated sub-system for hot page promotion that
maintains hotness data for accessed pages and promotes them when
a threshold is reached.

Currently I am considering only the number of accesses as an
indicator of page hotness. We need to consider the time of access
too. Both of them should contribute to the eventual "hotness" indicator.
Maybe something similar/analogous to how memory tiering derives
adistance value from bandwidth and latency could be tried out.

> 
>> The API for reporting the page access remains unchanged from [1].
>> When the page access gets recorded, the hotness data of the page
>> is updated and if it crosses a threshold, it gets tracked in the
>> heap as well. These heaps are per-target-node and corresponding
>> migrate threads will periodically extract the top records from
>> them and do batch migration. 
>>
> 
> I don't quite follow the heaps and tracking in the heap, could
> you please clarify

When different sub-systems report page accesses via the API
introduced by this new sub-system, a record for each such page
is stored in hash lists (hashed by PFN value). In addition to
the PFN and target_nid, the hotness record includes parameters
like frequency and time of access from which the hotness is
derived. Repeated reporting of access on the same PFN will result
in updating of hotness information. When the hotness of a
record (as updated during reporting of access) crosses a threshold,
the record becomes part of a max heap data structure. Records
in the max heap are arranged based on the hotness and hence
the top elements of the heap will correspond to the hottest
pages. There will be one such heap for each toptier node so
that per-toptier-node kpromoted thread can easily extract the
top N records from its own heap and perform batched migration.

Hope this clarifies.

Regards,
Bharata.