[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <63a6962f-6ffb-47cd-806d-ec568f0b2df7@suse.cz>
Date: Mon, 1 Sep 2025 22:34:32 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Andrew Morton <akpm@...ux-foundation.org>,
Ruan Shiyang <ruansy.fnst@...itsu.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org, lkp@...el.com,
ying.huang@...ux.alibaba.com, y-goto@...itsu.com, mingo@...hat.com,
peterz@...radead.org, juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, mgorman@...e.de,
vschneid@...hat.com, Li Zhijian <lizhijian@...itsu.com>,
Ben Segall <bsegall@...gle.com>, stable@...r.kernel.org
Subject: Re: [PATCH v3] mm: memory-tiering: fix PGPROMOTE_CANDIDATE counting
On 9/1/25 21:59, Andrew Morton wrote:
> On Mon, 1 Sep 2025 17:01:22 +0800 Ruan Shiyang <ruansy.fnst@...itsu.com> wrote:
>
>> Goto-san reported confusing pgpromote statistics where the
>> pgpromote_success count significantly exceeded pgpromote_candidate.
>>
>> On a system with three nodes (nodes 0-1: DRAM 4GB, node 2: NVDIMM 4GB):
>> # Enable demotion only
>> echo 1 > /sys/kernel/mm/numa/demotion_enabled
>> numactl -m 0-1 memhog -r200 3500M >/dev/null &
>> pid=$!
>> sleep 2
>> numactl memhog -r100 2500M >/dev/null &
>> sleep 10
>> kill -9 $pid # terminate the 1st memhog
>> # Enable promotion
>> echo 2 > /proc/sys/kernel/numa_balancing
>>
>> After a few seconds, we observeed `pgpromote_candidate < pgpromote_success`
>> $ grep -e pgpromote /proc/vmstat
>> pgpromote_success 2579
>> pgpromote_candidate 0
>>
>> In this scenario, after terminating the first memhog, the conditions for
>> pgdat_free_space_enough() are quickly met, and triggers promotion.
>> However, these migrated pages are only counted for in PGPROMOTE_SUCCESS,
>> not in PGPROMOTE_CANDIDATE.
>>
>> To solve these confusing statistics, introduce PGPROMOTE_CANDIDATE_NRL to
>> count the missed promotion pages. And also, not counting these pages into
>> PGPROMOTE_CANDIDATE is to avoid changing the existing algorithm or
>> performance of the promotion rate limit.
>>
>> ...
>>
>
> It would be good to have a Fixes: here, to tell people how far back to
> backport it.
>
> Could be either c6833e10008f or c959924b0dc5 afaict. I'll go with
> c6833e10008f, OK?
LGTM as a helpful pointer, but I don't think Cc: stable is necessary for
"admin might be confused" kind of thing if that's there since 6.1 and only
came up now.
Powered by blists - more mailing lists