lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <31ff4ffeb1f9448d8044f707f856c046@hygon.cn>
Date: Wed, 19 Mar 2025 09:42:28 +0000
From: Jianyong Wu <wujianyong@...on.cn>
To: Vincent Guittot <vincent.guittot@...aro.org>
CC: "mingo@...hat.com" <mingo@...hat.com>, "peterz@...radead.org"
	<peterz@...radead.org>, "jianyong.wu@...look.com" <jianyong.wu@...look.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] SCHED: scatter nohz idle balance target cpus



> -----Original Message-----
> From: Vincent Guittot <vincent.guittot@...aro.org>
> Sent: Wednesday, March 19, 2025 5:26 PM
> To: Jianyong Wu <wujianyong@...on.cn>
> Cc: mingo@...hat.com; peterz@...radead.org; jianyong.wu@...look.com;
> linux-kernel@...r.kernel.org
> Subject: Re: [PATCH] SCHED: scatter nohz idle balance target cpus
> 
> On Wed, 19 Mar 2025 at 10:03, Jianyong Wu <wujianyong@...on.cn> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Vincent Guittot <vincent.guittot@...aro.org>
> > > Sent: Wednesday, March 19, 2025 4:46 PM
> > > To: Jianyong Wu <wujianyong@...on.cn>
> > > Cc: mingo@...hat.com; peterz@...radead.org; jianyong.wu@...look.com;
> > > linux-kernel@...r.kernel.org
> > > Subject: Re: [PATCH] SCHED: scatter nohz idle balance target cpus
> > >
> > > On Tue, 18 Mar 2025 at 03:27, Jianyong Wu <wujianyong@...on.cn> wrote:
> > > >
> > > > Currently, cpu selection logic for nohz idle balance lacks history
> > > > info that leads to cpu0 is always chosen if it's in nohz cpu mask.
> > > > It's not fair fot the tasks reside in numa node0. It's worse in
> > > > the machine with large cpu number, nohz idle balance may be very heavy.
> > >
> > > Could you provide more details about why it's not fair for tasks
> > > that reside on numa node 0 ? cpu0 is idle so ilb doesn't steal time to other
> tasks.
> > >
> > > Do you have figures or use cases to highlight this unfairness ?
> > >
> > [Jianyong Wu]
> > Yeah, here is a test case.
> > In a system with a large number of CPUs (in my scenario, there are 256 CPUs),
> when the entire system is under a low load, if you try to bind two or more CPU -
> bound jobs to a single CPU other than CPU0, you'll notice that the softirq
> utilization for CPU0 can reach approximately 10%, while it remains negligible for
> other CPUs. By checking the /proc/softirqs file, it becomes evident that a
> significant number of SCHED softirqs are only executed on CPU0.
> 
> yes, but this 10% of softirq time would have been idle time otherwise so why
> should we care ?
> 
[Jianyong Wu] 
However, this value is proportional to the number of CPUs. In the event that tasks are scheduled to CPU0, delays will occur.
 
> At the opposite, if all your cpus are busy, then cpu0 will do the busy load balance
> of parents sched domain level whereas other cpus will not and this time is stolen
> to the task running on CPU0 which could be unfair compared to task running on
> other CPUS
> 
[Jianyong Wu] 
Yeah. Do you think this patch can mitigate this?
> 
> > > >
> > > > To address this issue, adding a member to "nohz" to indicate who
> > > > is chosen last time and choose next for this round of nohz idle balance.
> > > >
> > > > Signed-off-by: Jianyong Wu <wujianyong@...on.cn>
> > > > ---
> > > >  kernel/sched/fair.c | 9 ++++++---
> > > >  1 file changed, 6 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index
> > > > c798d2795243..ba6930c79e25 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -7197,6 +7197,7 @@ static struct {
> > > >         atomic_t nr_cpus;
> > > >         int has_blocked;                /* Idle CPUS has blocked
> load
> > > */
> > > >         int needs_update;               /* Newly idle CPUs need
> their
> > > next_balance collated */
> > > > +       int last_cpu;                   /* Last cpu chosen to do
> nohz
> > > idle balance */
> > > >         unsigned long next_balance;     /* in jiffy units */
> > > >         unsigned long next_blocked;     /* Next update of blocked
> load in
> > > jiffies */
> > > >  } nohz ____cacheline_aligned;
> > > > @@ -12266,13 +12267,15 @@ static inline int find_new_ilb(void)
> > > >
> > > >         hk_mask = housekeeping_cpumask(HK_TYPE_KERNEL_NOISE);
> > > >
> > > > -       for_each_cpu_and(ilb_cpu, nohz.idle_cpus_mask, hk_mask) {
> > > > +       for_each_cpu_wrap(ilb_cpu, nohz.idle_cpus_mask,
> > > > + nohz.last_cpu
> > > > + + 1) {
> > > >
> > > > -               if (ilb_cpu == smp_processor_id())
> > > > +               if (ilb_cpu == smp_processor_id() ||
> > > > + !cpumask_test_cpu(ilb_cpu, hk_mask))
> > > >                         continue;
> > > >
> > > > -               if (idle_cpu(ilb_cpu))
> > > > +               if (idle_cpu(ilb_cpu)) {
> > > > +                       nohz.last_cpu = ilb_cpu;
> > > >                         return ilb_cpu;
> > > > +               }
> > > >         }
> > > >
> > > >         return -1;
> > > > --
> > > > 2.43.0
> > > >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ