lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKfTPtAF4ZxMF3d01TKOTXnB3MiBnaPYfMvOffs2EY3ttAHRNA@mail.gmail.com>
Date: Tue, 13 Jan 2026 14:20:24 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Shrikanth Hegde <sshegde@...ux.ibm.com>
Cc: mingo@...nel.org, peterz@...radead.org, linux-kernel@...r.kernel.org, 
	kprateek.nayak@....com, juri.lelli@...hat.com, vschneid@...hat.com, 
	tglx@...nel.org, dietmar.eggemann@....com, anna-maria@...utronix.de, 
	frederic@...nel.org, wangyang.guo@...el.com
Subject: Re: [PATCH v4 1/3] sched/fair: Move checking for nohz cpus after time check

On Tue, 13 Jan 2026 at 12:18, Shrikanth Hegde <sshegde@...ux.ibm.com> wrote:
>
>
>
> On 1/13/26 3:51 PM, Vincent Guittot wrote:
> > On Tue, 13 Jan 2026 at 10:23, Shrikanth Hegde <sshegde@...ux.ibm.com> wrote:
> >>
> >>
> >>
> >> On 1/13/26 2:37 PM, Vincent Guittot wrote:
> >>> On Mon, 12 Jan 2026 at 06:05, Shrikanth Hegde <sshegde@...ux.ibm.com> wrote:
> >>>>
> >>>> NOHZ idle load balancer is kicked off only after time check. So move
> >>>> the atomic read after the time check to access it only when needed.
> >>>>
> >>>> When there are no idle CPUs(100% busy), even if the flag gets set to
> >>>> NOHZ_STATS_KICK | NOHZ_NEXT_KICK, find_new_ilb will fail and
> >>>> there will be no NOHZ idle balance. The current behaviour is retained.
> >>>>
> >>>> Note: This patch doesn't solve any cacheline overheads. No improvement
> >>>> in performance apart from saving a few cycles of atomic_read.
> >>>
> >>> But won't these cycles be then wasted by calling needlessly kick_ilb
> >>>
> >>
> >> when there are nohz cpus, i.e nohz.nr_cpus > 0, there is no change in codeflow.
> >>
> >> Only when system is 100%(which is expected to be rare), nohz.nr_cpus == 0,
> >> then it is expected that has_blocked_load = 0. So flags shouldn't be set.
> >
> > The way we are setting/clearing has_blocked_load vs
> > nr_cpus/idle_cpus_mask implies that it's possible to get
> > has_blocked_load == 1 but nr_cpus == 0 although it's a corner case and
> > not a default behavior
> >
> > No CPUs are idle: nr_cpus == 0
> >
> > CPU 0 enters idle
> >    - inc nr_cpus and set idle_cpus_mask
> >    - set nohz.has_blocked
> >
> > CPU0 wakes up
> >
> > Tick fires on CPU0
> >    - dec nr_cpus and clear idle_cpus_mask
> >    - nohz.has_blocked == 1, most probably now > nohz.next_blocked, if
> > now < nohz.next_balance, we skip the test of nr_cpus and we call
> > kick_ilb() but nr_cpus == 0 and idle_cpus_mask is empty
> >
> >> Note we are still doing a return if nohz.nr_cpus == 0. So kick_ilb shouldn't be
> >> called.
> >
> > The return can be skipped by if (time_before(now, nohz.next_balance)) goto out
> >
>
> Assuming HZ=1000,
>
> I see LOAD_AVG_PERIOD = 32, whereas next_balance is usually 60. so it is
> a really narrow window of 28 ticks and system being close to 100% busy.
>
>
> >>
> >> Do you see any path still calling kick_ilb un-necessarily?
> >
> > Yes but at the same time it's clearly not the main case
> >
> >
>
> So i assume we can do this patch considering the common case?
> If not, let me know, I can drop it.

Yes, I suppose it's ok.
Could you add a comment in commit so we remember

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ