[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251006105346.00004634@huawei.com>
Date: Mon, 6 Oct 2025 10:53:46 +0100
From: Jonathan Cameron <jonathan.cameron@...wei.com>
To: Bharata B Rao <bharata@....com>
CC: <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
<dave.hansen@...el.com>, <gourry@...rry.net>, <hannes@...xchg.org>,
<mgorman@...hsingularity.net>, <mingo@...hat.com>, <peterz@...radead.org>,
<raghavendra.kt@....com>, <riel@...riel.com>, <rientjes@...gle.com>,
<sj@...nel.org>, <weixugc@...gle.com>, <willy@...radead.org>,
<ying.huang@...ux.alibaba.com>, <ziy@...dia.com>, <dave@...olabs.net>,
<nifan.cxl@...il.com>, <xuezhengchu@...wei.com>, <yiannis@...corp.com>,
<akpm@...ux-foundation.org>, <david@...hat.com>, <byungchul@...com>,
<kinseyho@...gle.com>, <joshua.hahnjy@...il.com>, <yuanchu@...gle.com>,
<balbirs@...dia.com>, <alok.rathore@...sung.com>
Subject: Re: [RFC PATCH v2 8/8] mm: sched: Move hot page promotion from
NUMAB=2 to kpromoted
On Mon, 6 Oct 2025 11:27:21 +0530
Bharata B Rao <bharata@....com> wrote:
> On 03-Oct-25 6:08 PM, Jonathan Cameron wrote:
> > On Wed, 10 Sep 2025 20:16:53 +0530
> > Bharata B Rao <bharata@....com> wrote:
> >
> >> Currently hot page promotion (NUMA_BALANCING_MEMORY_TIERING
> >> mode of NUMA Balancing) does hot page detection (via hint faults),
> >> hot page classification and eventual promotion, all by itself and
> >> sits within the scheduler.
> >>
> >> With the new hot page tracking and promotion mechanism being
> >> available, NUMA Balancing can limit itself to detection of
> >> hot pages (via hint faults) and off-load rest of the
> >> functionality to the common hot page tracking system.
> >>
> >> pghot_record_access(PGHOT_HINT_FAULT) API is used to feed the
> >> hot page info. In addition, the migration rate limiting and
> >> dynamic threshold logic are moved to kpromoted so that the same
> >> can be used for hot pages reported by other sources too.
> >>
> >> Signed-off-by: Bharata B Rao <bharata@....com>
> >
> > Making a direct replacement without any fallback to previous method
> > is going to need a lot of data to show there are no important regressions.
> >
> > So bold move if that's the intent!
>
> Firstly I am only moving the existing hot page heuristics that is part of
> NUMAB=2 to kpromoted so that the same can be applied to hot pages being
> identified by other sources. So the hint fault mechanism that is inherent
> to NUMAB=2 still remains.
That makes sense.
>
> In fact, kscand effort started as a potential replacement for the existing
> hot page promotion mechanism by getting rid of hint faults and moving the
> page table scanning out of process context.
Understood and I'm in favor of the that approach but not sure it will be
a fit for all workloads.
>
> In any case, I will start including numbers from the next post.
Great.
> >>
> >> static unsigned int sysctl_pghot_freq_window = KPROMOTED_FREQ_WINDOW;
> >>
> >> +/* Restrict the NUMA promotion throughput (MB/s) for each target node. */
> >> +static unsigned int sysctl_pghot_promote_rate_limit = 65536;
> >
> > If the comment correlates with the value, this is 64 GiB/s? That seems
> > unlikely if I guess possible.
>
> IIUC, the existing logic tries to limit promotion rate to 64 GiB/s by
> limiting the number of candidate pages that are promoted within the
> 1s observation interval.
>
> Are you saying that achieving the rate of 64 GiB/s is not possible
> or unlikely?
Seem rather too high to me, but maybe I just have the wrong mental model
of what we should be moving.
>
> >
> >> +
> >> #ifdef CONFIG_SYSCTL
> >> static const struct ctl_table pghot_sysctls[] = {
> >> {
> >> @@ -44,8 +50,17 @@ static const struct ctl_table pghot_sysctls[] = {
> >> .proc_handler = proc_dointvec_minmax,
> >> .extra1 = SYSCTL_ZERO,
> >> },
> >> + {
> >> + .procname = "pghot_promote_rate_limit_MBps",
> >> + .data = &sysctl_pghot_promote_rate_limit,
> >> + .maxlen = sizeof(unsigned int),
> >> + .mode = 0644,
> >> + .proc_handler = proc_dointvec_minmax,
> >> + .extra1 = SYSCTL_ZERO,
> >> + },
> >> };
> >> #endif
> >> +
> > Put that in earlier patch to reduce noise here.
>
> This patch moves the hot page heuristics to kpromoted and hence this
> related sysctl is also being moved in this patch.
I just mean the blank line - not the block above.
This is just a patch set tidying up comment.
Jonathan
Powered by blists - more mailing lists