linux-kernel - Re: [PATCH v4 1/3] sched/fair: Move checking for nohz cpus after time check

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <134bdd32-6ca9-4860-9c3b-786411fadbe9@linux.ibm.com>
Date: Tue, 13 Jan 2026 16:48:04 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: mingo@...nel.org, peterz@...radead.org, linux-kernel@...r.kernel.org,
        kprateek.nayak@....com, juri.lelli@...hat.com, vschneid@...hat.com,
        tglx@...nel.org, dietmar.eggemann@....com, anna-maria@...utronix.de,
        frederic@...nel.org, wangyang.guo@...el.com
Subject: Re: [PATCH v4 1/3] sched/fair: Move checking for nohz cpus after time
 check



On 1/13/26 3:51 PM, Vincent Guittot wrote:
> On Tue, 13 Jan 2026 at 10:23, Shrikanth Hegde <sshegde@...ux.ibm.com> wrote:
>>
>>
>>
>> On 1/13/26 2:37 PM, Vincent Guittot wrote:
>>> On Mon, 12 Jan 2026 at 06:05, Shrikanth Hegde <sshegde@...ux.ibm.com> wrote:
>>>>
>>>> NOHZ idle load balancer is kicked off only after time check. So move
>>>> the atomic read after the time check to access it only when needed.
>>>>
>>>> When there are no idle CPUs(100% busy), even if the flag gets set to
>>>> NOHZ_STATS_KICK | NOHZ_NEXT_KICK, find_new_ilb will fail and
>>>> there will be no NOHZ idle balance. The current behaviour is retained.
>>>>
>>>> Note: This patch doesn't solve any cacheline overheads. No improvement
>>>> in performance apart from saving a few cycles of atomic_read.
>>>
>>> But won't these cycles be then wasted by calling needlessly kick_ilb
>>>
>>
>> when there are nohz cpus, i.e nohz.nr_cpus > 0, there is no change in codeflow.
>>
>> Only when system is 100%(which is expected to be rare), nohz.nr_cpus == 0,
>> then it is expected that has_blocked_load = 0. So flags shouldn't be set.
> 
> The way we are setting/clearing has_blocked_load vs
> nr_cpus/idle_cpus_mask implies that it's possible to get
> has_blocked_load == 1 but nr_cpus == 0 although it's a corner case and
> not a default behavior
> 
> No CPUs are idle: nr_cpus == 0
> 
> CPU 0 enters idle
>    - inc nr_cpus and set idle_cpus_mask
>    - set nohz.has_blocked
> 
> CPU0 wakes up
> 
> Tick fires on CPU0
>    - dec nr_cpus and clear idle_cpus_mask
>    - nohz.has_blocked == 1, most probably now > nohz.next_blocked, if
> now < nohz.next_balance, we skip the test of nr_cpus and we call
> kick_ilb() but nr_cpus == 0 and idle_cpus_mask is empty
> 
>> Note we are still doing a return if nohz.nr_cpus == 0. So kick_ilb shouldn't be
>> called.
> 
> The return can be skipped by if (time_before(now, nohz.next_balance)) goto out
> 

Assuming HZ=1000,

I see LOAD_AVG_PERIOD = 32, whereas next_balance is usually 60. so it is 
a really narrow window of 28 ticks and system being close to 100% busy.


>>
>> Do you see any path still calling kick_ilb un-necessarily?
> 
> Yes but at the same time it's clearly not the main case
> 
> 

So i assume we can do this patch considering the common case?
If not, let me know, I can drop it.