[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <911f316b-87cf-45eb-8d9e-412473d7176a@amd.com>
Date: Wed, 11 Feb 2026 21:00:26 +0530
From: Bharata B Rao <bharata@....com>
To: <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>
CC: <Jonathan.Cameron@...wei.com>, <dave.hansen@...el.com>,
<gourry@...rry.net>, <mgorman@...hsingularity.net>, <mingo@...hat.com>,
<peterz@...radead.org>, <raghavendra.kt@....com>, <riel@...riel.com>,
<rientjes@...gle.com>, <sj@...nel.org>, <weixugc@...gle.com>,
<willy@...radead.org>, <ying.huang@...ux.alibaba.com>, <ziy@...dia.com>,
<dave@...olabs.net>, <nifan.cxl@...il.com>, <xuezhengchu@...wei.com>,
<yiannis@...corp.com>, <akpm@...ux-foundation.org>, <david@...hat.com>,
<byungchul@...com>, <kinseyho@...gle.com>, <joshua.hahnjy@...il.com>,
<yuanchu@...gle.com>, <balbirs@...dia.com>, <alok.rathore@...sung.com>,
<shivankg@....com>
Subject: Re: [RFC PATCH v5 00/10] mm: Hot page tracking and promotion
infrastructure
On 29-Jan-26 8:10 PM, Bharata B Rao wrote:
>
> Results
> =======
> TODO: Will post benchmark nubmers as reply to this patchset soon.
Here are Graph500 numbers for the hint fault source:
Test system details
-------------------
3 node AMD Zen5 system with 2 regular NUMA nodes (0, 1) and a CXL node (2)
$ numactl -H
available: 3 nodes (0-2)
node 0 cpus: 0-95,192-287
node 0 size: 128460 MB
node 1 cpus: 96-191,288-383
node 1 size: 128893 MB
node 2 cpus:
node 2 size: 257993 MB
node distances:
node 0 1 2
0: 10 32 50
1: 32 10 60
2: 255 255 10
Hotness sources
---------------
NUMAB0 - Without NUMA Balancing in base case and with no source enabled
in the pghot case. No migrations occur.
NUMAB2 - Existing hot page promotion for the base case and
use of hint faults as source in the pghot case.
Pghot by default promotes after two accesses but for NUMAB2 source,
promotion is done after one access to match the base behaviour.
(/sys/kernel/debug/pghot/freq_threshold=1)
Graph500 details
----------------
Command: mpirun -n 128 --bind-to core --map-by core
graph500/src/graph500_reference_bfs 28 16
After the graph creation, the processes are stopped and data is migrated
to CXL node 2 before continuing so that BFS phase starts accessing lower
tier memory.
Total memory usage is slightly over 100GB and will fit within Node 0 and 1.
Hence there is no memory pressure to induce demotions.
=====================================================================================
Base Base pghot-default
pghot-precise
NUMAB0 NUMAB2 NUMAB2 NUMAB2
=====================================================================================
harmonic_mean_TEPS 5.10676e+08 7.56804e+08 5.92473e+08 7.47091e+08
mean_time 8.41027 5.67508 7.24915 5.74886
median_TEPS 5.11535e+08 7.24252e+08 5.63155e+08 7.71638e+08
max_TEPS 5.1785e+08 1.06051e+09 7.88018e+08 1.0504e+09
pgpromote_success 0 13557718 13737730 13734469
numa_pte_updates 0 26491591 26848847 26726856
numa_hint_faults 0 13558077 13882743 13798024
=====================================================================================
- The base case shows a good improvement with NUMAB2(48%) in harmonic_mean_TEPS.
- The same improvement gets maintained with pghot-precise too (46%).
- pghot-default mode doesn't show benefit even when achieving similar page promotion
numbers. This mode doesn't track accessing NID and by default promotes to NID=0
which probably isn't all that beneficial as processes are running on both Node 0
and Node 1.
Powered by blists - more mailing lists