[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4df58408-58d7-41ad-afa7-c42a64689ec8@amd.com>
Date: Mon, 9 Feb 2026 09:00:48 +0530
From: Bharata B Rao <bharata@....com>
To: <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>
CC: <Jonathan.Cameron@...wei.com>, <dave.hansen@...el.com>,
<gourry@...rry.net>, <mgorman@...hsingularity.net>, <mingo@...hat.com>,
<peterz@...radead.org>, <raghavendra.kt@....com>, <riel@...riel.com>,
<rientjes@...gle.com>, <sj@...nel.org>, <weixugc@...gle.com>,
<willy@...radead.org>, <ying.huang@...ux.alibaba.com>, <ziy@...dia.com>,
<dave@...olabs.net>, <nifan.cxl@...il.com>, <xuezhengchu@...wei.com>,
<yiannis@...corp.com>, <akpm@...ux-foundation.org>, <david@...hat.com>,
<byungchul@...com>, <kinseyho@...gle.com>, <joshua.hahnjy@...il.com>,
<yuanchu@...gle.com>, <balbirs@...dia.com>, <alok.rathore@...sung.com>,
<shivankg@....com>
Subject: Re: [RFC PATCH v5 00/10] mm: Hot page tracking and promotion
infrastructure
On 29-Jan-26 8:10 PM, Bharata B Rao wrote:
> Results
> =======
> TODO: Will post benchmark nubmers as reply to this patchset soon.
Numbers from redis-memtier benchmark:
Test system details
-------------------
3 node AMD Zen5 system with 2 regular NUMA nodes (0, 1) and a CXL node (2)
$ numactl -H
available: 3 nodes (0-2)
node 0 cpus: 0-95,192-287
node 0 size: 128460 MB
node 1 cpus: 96-191,288-383
node 1 size: 128893 MB
node 2 cpus:
node 2 size: 257993 MB
node distances:
node 0 1 2
0: 10 32 50
1: 32 10 60
2: 255 255 10
Hotness sources
---------------
NUMAB0 - Without NUMA Balancing in base case and with no source enabled
in the patched case. No migrations occur.
NUMAB2 - Existing hot page promotion for the base case and
use of hint faults as source in the patched case.
Pghot by default promotes after two accesses but for NUMAB2 source,
promotion is done after one access to match the base behaviour.
(/sys/kernel/debug/pghot/freq_threshold=1)
==============================================================
Scenario 1 - Enough memory in toptier and hence only promotion
==============================================================
In the setup phase, 64GB database is provisioned and explicitly moved
to Node 2 by migrating redis-server's memory to Node 2.
Memtier is run on Node 1.
Parallel distribution, 50% of the keys accessed, each 4 times.
16 Threads
100 Connections per thread
77808 Requests per client
==================================================================================================
Type Ops/sec Avg. Latency p50 Latency p99 Latency p99.9
Latency KB/sec
--------------------------------------------------------------------------------------------------
Base, NUMAB0
Totals 225827.75 226.49746 225.27900 425.98300
454.65500 513106.09
--------------------------------------------------------------------------------------------------
Base, NUMAB2
Totals 254869.29 205.61759 216.06300 399.35900
454.65500 579091.74
--------------------------------------------------------------------------------------------------
pghot-default, NUMAB2
Totals 264229.35 202.81411 215.03900 393.21500
446.46300 600358.86
--------------------------------------------------------------------------------------------------
pghot-precise, NUMAB2
Totals 261136.17 203.32692 215.03900 391.16700
446.46300 593330.81
==================================================================================================
pgpromote_success
==================================
Base, NUMAB0 0
Base, NUMAB2 10,435,178
pghot-default, NUMAB2 10,435,031
pghot-precise, NUMAB2 10,435,245
==================================
- There is a clear benefit of hot page promotion seen. Both
base and pghot show similar benefits.
- The number of pages promoted in both cases are more or less
same.
==============================================================
Scenario 2 - Toptier memory overcommited, promotion + demotion
==============================================================
In the setup phase, 192GB database is provisioned. The database occupies
Node 1 entirely(~128GB) and spills over to Node 2 (~64GB).
Memtier is run on Node 1.
Parallel distribution, 50% of the keys accessed, each 4 times.
16 Threads
100 Connections per thread
233424 Requests per client
==================================================================================================
Type Ops/sec Avg. Latency p50 Latency p99 Latency p99.9
Latency KB/sec
--------------------------------------------------------------------------------------------------
Base, NUMAB0
Totals 246474.55 211.90623 192.51100 370.68700
448.51100 560235.63
--------------------------------------------------------------------------------------------------
Base, NUMAB2
Totals 232790.88 221.18604 214.01500 419.83900
509.95100 529132.72
--------------------------------------------------------------------------------------------------
pghot-default, NUMAB2
Totals 241615.60 216.12761 210.94300 391.16700
475.13500 549191.27
--------------------------------------------------------------------------------------------------
pghot-precise, NUMAB2
Totals 238557.37 217.57630 207.87100 395.26300
471.03900 542239.92
==================================================================================================
pgpromote_success pgdemote_kswapd
===============================================================
Base, NUMAB0 0 832,494
Base, NUMAB2 352,075 720,409
pghot-default, NUMAB2 25,865,321 26,154,984
pghot-precise, NUMAB2 25,525,429 25,838,095
===============================================================
- No clear benefit is seen with hot page promotion both in base and pghot case.
- Most promotion attempts in base case fail because the NUMA hint fault latency
is found to exceed the threshold value (default threshold of 1000ms) in
majority of the promotion attempts.
- Unlike base NUMAB2 where the hint fault latency is the difference between the
PTE update time (during scanning) and the access time (hint fault), pghot uses
a single latency threshold (4000ms in pghot-default and 5000ms in
pghot-precise) for two purposes.
1. If the time difference between successive accesses are within the
threshold, the page is marked as hot.
2. Later when kmigrated picks up the page for migration, it will migrate
only if the difference between the current time and the time when the
page was marked hot is with the threshold.
Because of the above difference in behaviour, more number of pages get
qualified for promotion compared to base NUMAB2.
Powered by blists - more mailing lists