[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1db2d6df-16ff-4521-ada5-da585b87b06f@os.amperecomputing.com>
Date: Thu, 21 Aug 2025 19:18:42 +0800
From: Adam Li <adamli@...amperecomputing.com>
To: Valentin Schneider <vschneid@...hat.com>, mingo@...hat.com,
peterz@...radead.org, juri.lelli@...hat.com, vincent.guittot@...aro.org
Cc: dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, cl@...ux.com, frederic@...nel.org,
linux-kernel@...r.kernel.org, patches@...erecomputing.com
Subject: Re: [PATCH] sched/nohz: Fix NOHZ imbalance by adding options for ILB
CPU
On 8/20/2025 7:46 PM, Valentin Schneider wrote:
>
> I'd say resend the whole series with the right folks cc'd.
>
OK. I resent the patch series.
Please refer to: https://lore.kernel.org/all/20250821042707.62993-1-adamli@os.amperecomputing.com/
>> 'nohz_full' option is supposed to benefit performance by reducing kernel
>> noise I think. Could you please give more detail on
>> 'NOHZ_FULL context switch overhead'?
>>
>
> The doc briefly touches on that:
>
> https://docs.kernel.org/timers/no_hz.html#omit-scheduling-clock-ticks-for-cpus-with-only-one-runnable-task
>
> The longer story is have a look at kernel/context_tracking.c; every
> transition into and out of the kernel to and from user or idle requires
> additional atomic operations and synchronization.
>
> It would be worth for you to quantify how much these processes
> sleep/context switch, it could be that keep the tick enabled incurs a lower
> throughput penalty than the NO_HZ_FULL overheads.
>
Thanks for the information.
>>> As for the actual balancing, yeah if you have idle NOHZ_FULL CPUs they
>>> won't do the periodic balance; the residual 1Hz remote tick doesn't do that
>>> either. But they should still do the newidle balance to pull work before
>>> going tickless idle, and wakeup balance should help as well, albeit that
>>> also depends on your topology.
>>>
>>
>> I think the newidle balance and wakeup balance do not help in this case
>> because the workload has few sleep and wakeup.
>>
>
> Right. So other than the NO_HZ_FULL vs NO_HZ_IDLE considerations above, you
> could manually affine the threads of the workload. Depending on how much
> control you have over how many threads it spawn, you could either pin on
> thread per CPU, or just spawn the workload into a cpuset covering the
> NO_HZ_FULL CPUs.
>
Yes, binding the threads to CPU can work around the performance
issue caused by load imbalance. Should we document that 'nohz_full' may cause
the scheduler load balancing not working well and CPU affinity is preferred?
> Having the scheduler do the balancing is bit of a precarious
> situation. Your single housekeeping CPU is pretty much going to be always
> running things, does it make sense to have it run the NOHZ idle balance
> when there are available idle NOHZ_FULL CPUs? And in the same sense, does
> it make sense to disturb an idle NOHZ_FULL CPU to get it to spread load on
> other NOHZ_FULL CPUs? Admins that manually affine their threads will
> probably say no.
>
I think when the NOHZ_FULL CPU is added to nohz.idle_cpus_mask and
its tick is stopped, the CPU is 'very' idle. We can safely assign some work to it.
> 9b019acb72e4 ("sched/nohz: Run NOHZ idle load balancer on HK_FLAG_MISC CPUs")
> also mentions SMT being an issue.
>
>From the commit message of 9b019acb72e4:
"The problem was observed with increased jitter on an application
running on CPU0, caused by NOHZ idle load balancing being run on
CPU1 (an SMT sibling)."
Can we say if *no* SMT, it is safe to run NOHZ idle load balancing
on CPU in nohz.idle_cpus_mask? My patch checks '!sched_smt_active()' when
searching from nohz.idle_cpus_mask.
Thanks,
-adam
Powered by blists - more mailing lists