lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xhsmho6sagz7p.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
Date: Wed, 20 Aug 2025 10:43:38 +0200
From: Valentin Schneider <vschneid@...hat.com>
To: Adam Li <adamli@...amperecomputing.com>, mingo@...hat.com,
 peterz@...radead.org, juri.lelli@...hat.com, vincent.guittot@...aro.org
Cc: dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
 mgorman@...e.de, cl@...ux.com, frederic@...nel.org,
 linux-kernel@...r.kernel.org, patches@...erecomputing.com
Subject: Re: [PATCH] sched/nohz: Fix NOHZ imbalance by adding options for
 ILB CPU

On 20/08/25 11:35, Adam Li wrote:
> On 8/19/2025 10:00 PM, Valentin Schneider wrote:
>>
>> I'm not understanding why, in the scenarios outlined above, more NOHZ idle
>> balancing is a good thing.
>>
>> Considering only housekeeping CPUs, they're all covered by wakeup, periodic
>> and idle balancing (on top of NOHZ idle balancing when relevant). So if
>> find_new_ilb() never finds a NOHZ-idle CPU, then that means your HK CPUs
>> are either always busy or never stopping the tick when going idle, IOW they
>> always have some work to do within a jiffy boundary.
>> > Am I missing something?
>>
>
> I agree with your description about the housekeeping CPUs. In the worst case,
> the system only has one housekeeping CPU and this housekeeping CPU is so busy
> that:
> 1) This housekeeping CPU is unlikely idle;
> 2) and this housekeeping CPU is unlikely in 'nohz.idle_cpus_mask' because tick
> is not stopped.
> Therefore find_new_ilb() may very likely return -1. *No* CPU can be selected
> to do NOHZ idle load balancing.
>
> This patch tries to fix the imbalance of NOHZ idle CPUs (CPUs in nohz.idle_cpus_mask).
> Here is more background:
>
> When running llama on arm64 server, some CPUs *keep* idle while others
> are 100% busy. All CPUs are in 'nohz_full=' cpu list, and CONFIG_NO_HZ_FULL
> is set.
>

I assume you mean all but one CPU is in 'nohz_full=' since you need at
least one housekeeping CPU. But in that case this becomes a slightly
different problem, since no CPU in 'nohz_full' will be in

  housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)

> The problem is caused by two issues:
> 1) Some idle CPUs cannot be added to 'nohz.idle_cpus_mask',
> this bug is fixed by another patch:
> https://lore.kernel.org/all/20250815065115.289337-2-adamli@os.amperecomputing.com/
>
> 2) Even if the idle CPUs are in 'nohz.idle_cpus_mask', *no* CPU can be selected to
> do NOHZ idle load balancing because conditions in find_new_ilb() is too strict.
> This patch tries to solve this issue.
>
> Hope this information helps.
>

I hadn't seen that patch; that cclist is quite small, you'll want to add
the scheduler people to our next submission.

So IIUC:
- Pretty much all your CPUs are NOHZ_FULL
- When they go idle they remain so for a while despite work being available

My first question would be: is NOHZ_FULL really right for your workload?
It's mainly designed to be used with always-running userspace tasks,
generally affined to a CPU by the system administrator.
Here AIUI you're relying on the scheduler load balancing to distribute work
to the NOHZ_FULL CPUs, so you're going to be penalized a lot by the
NOHZ_FULL context switch overheads. What's the point? Wouldn't you have
less overhead with just NOHZ_IDLE?

As for the actual balancing, yeah if you have idle NOHZ_FULL CPUs they
won't do the periodic balance; the residual 1Hz remote tick doesn't do that
either. But they should still do the newidle balance to pull work before
going tickless idle, and wakeup balance should help as well, albeit that
also depends on your topology.

Could you share your system topology and your actual nohz_full cmdline?

> Thanks,
> -adam
>
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ