linux-kernel - Re: [PATCH] sched/nohz: Fix NOHZ imbalance by adding options for ILB CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f6869880-1f7c-a39b-dc8e-4c3a84ba51ef@gentwo.org>
Date: Wed, 20 Aug 2025 10:31:24 -0700 (PDT)
From: "Christoph Lameter (Ampere)" <cl@...two.org>
To: Valentin Schneider <vschneid@...hat.com>
cc: Adam Li <adamli@...amperecomputing.com>, mingo@...hat.com, 
    peterz@...radead.org, juri.lelli@...hat.com, vincent.guittot@...aro.org, 
    dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com, 
    mgorman@...e.de, frederic@...nel.org, linux-kernel@...r.kernel.org, 
    patches@...erecomputing.com
Subject: Re: [PATCH] sched/nohz: Fix NOHZ imbalance by adding options for
 ILB CPU

On Wed, 20 Aug 2025, Valentin Schneider wrote:

> My first question would be: is NOHZ_FULL really right for your workload?

Yes performance is improved. AI workloads are like HPC workloads in that
they need to do compute and then rendezvous for data exchange. Variations
in the runtime due to timer ticks cause idle periods where the rendezvous
cannot be completed because some cpus are delayed.

The more frequent rendezvous can be performed the better the performance
numbers will be.

> It's mainly designed to be used with always-running userspace
tasks, > generally affined to a CPU by the system administrator.

hohz full has been reworked somewhat since the early days and works in a
more general way today.

> Here AIUI you're relying on the scheduler load balancing to distribute work
> to the NOHZ_FULL CPUs, so you're going to be penalized a lot by the
> NOHZ_FULL context switch overheads. What's the point? Wouldn't you have
> less overhead with just NOHZ_IDLE?

The benchmarks show a regression of 10-20% if the tick is operational.
The context switch overhead is negligible since the cpus are doing compute
and not system calls.

> As for the actual balancing, yeah if you have idle NOHZ_FULL CPUs they
> won't do the periodic balance; the residual 1Hz remote tick doesn't do that
> either. But they should still do the newidle balance to pull work before
> going tickless idle, and wakeup balance should help as well, albeit that
> also depends on your topology.

That should work in general and not depend on any hardware topology. In
this case we have a linear sched domain including all processors.