[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xhsmho6sagz7p.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
Date: Wed, 20 Aug 2025 10:43:38 +0200
From: Valentin Schneider <vschneid@...hat.com>
To: Adam Li <adamli@...amperecomputing.com>, mingo@...hat.com,
peterz@...radead.org, juri.lelli@...hat.com, vincent.guittot@...aro.org
Cc: dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, cl@...ux.com, frederic@...nel.org,
linux-kernel@...r.kernel.org, patches@...erecomputing.com
Subject: Re: [PATCH] sched/nohz: Fix NOHZ imbalance by adding options for
ILB CPU
On 20/08/25 11:35, Adam Li wrote:
> On 8/19/2025 10:00 PM, Valentin Schneider wrote:
>>
>> I'm not understanding why, in the scenarios outlined above, more NOHZ idle
>> balancing is a good thing.
>>
>> Considering only housekeeping CPUs, they're all covered by wakeup, periodic
>> and idle balancing (on top of NOHZ idle balancing when relevant). So if
>> find_new_ilb() never finds a NOHZ-idle CPU, then that means your HK CPUs
>> are either always busy or never stopping the tick when going idle, IOW they
>> always have some work to do within a jiffy boundary.
>> > Am I missing something?
>>
>
> I agree with your description about the housekeeping CPUs. In the worst case,
> the system only has one housekeeping CPU and this housekeeping CPU is so busy
> that:
> 1) This housekeeping CPU is unlikely idle;
> 2) and this housekeeping CPU is unlikely in 'nohz.idle_cpus_mask' because tick
> is not stopped.
> Therefore find_new_ilb() may very likely return -1. *No* CPU can be selected
> to do NOHZ idle load balancing.
>
> This patch tries to fix the imbalance of NOHZ idle CPUs (CPUs in nohz.idle_cpus_mask).
> Here is more background:
>
> When running llama on arm64 server, some CPUs *keep* idle while others
> are 100% busy. All CPUs are in 'nohz_full=' cpu list, and CONFIG_NO_HZ_FULL
> is set.
>
I assume you mean all but one CPU is in 'nohz_full=' since you need at
least one housekeeping CPU. But in that case this becomes a slightly
different problem, since no CPU in 'nohz_full' will be in
housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)
> The problem is caused by two issues:
> 1) Some idle CPUs cannot be added to 'nohz.idle_cpus_mask',
> this bug is fixed by another patch:
> https://lore.kernel.org/all/20250815065115.289337-2-adamli@os.amperecomputing.com/
>
> 2) Even if the idle CPUs are in 'nohz.idle_cpus_mask', *no* CPU can be selected to
> do NOHZ idle load balancing because conditions in find_new_ilb() is too strict.
> This patch tries to solve this issue.
>
> Hope this information helps.
>
I hadn't seen that patch; that cclist is quite small, you'll want to add
the scheduler people to our next submission.
So IIUC:
- Pretty much all your CPUs are NOHZ_FULL
- When they go idle they remain so for a while despite work being available
My first question would be: is NOHZ_FULL really right for your workload?
It's mainly designed to be used with always-running userspace tasks,
generally affined to a CPU by the system administrator.
Here AIUI you're relying on the scheduler load balancing to distribute work
to the NOHZ_FULL CPUs, so you're going to be penalized a lot by the
NOHZ_FULL context switch overheads. What's the point? Wouldn't you have
less overhead with just NOHZ_IDLE?
As for the actual balancing, yeah if you have idle NOHZ_FULL CPUs they
won't do the periodic balance; the residual 1Hz remote tick doesn't do that
either. But they should still do the newidle balance to pull work before
going tickless idle, and wakeup balance should help as well, albeit that
also depends on your topology.
Could you share your system topology and your actual nohz_full cmdline?
> Thanks,
> -adam
>
>
Powered by blists - more mailing lists