lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7231bfb1-9acc-656-c6b6-20bd8624e08a@inria.fr>
Date: Thu, 18 Jan 2024 18:43:19 +0100 (CET)
From: Julia Lawall <julia.lawall@...ia.fr>
To: Vincent Guittot <vincent.guittot@...aro.org>
cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
    Dietmar Eggemann <dietmar.eggemann@....com>, Mel Gorman <mgorman@...e.de>, 
    linux-kernel@...r.kernel.org
Subject: Re: EEVDF and NUMA balancing



On Thu, 18 Jan 2024, Vincent Guittot wrote:

> On Thu, 18 Jan 2024 at 17:50, Julia Lawall <julia.lawall@...ia.fr> wrote:
> >
> >
> >
> > On Thu, 18 Jan 2024, Vincent Guittot wrote:
> >
> > > Hi Julia,
> > >
> > > Sorry for the delay. I have been involved on other perf regression
> > >
> > > On Fri, 5 Jan 2024 at 18:27, Julia Lawall <julia.lawall@...ia.fr> wrote:
> > > >
> > > >
> > > >
> > > > On Fri, 5 Jan 2024, Julia Lawall wrote:
> > > >
> > > > >
> > > > >
> > > > > On Fri, 5 Jan 2024, Vincent Guittot wrote:
> > > > >
> > > > > > On Fri, 5 Jan 2024 at 15:51, Julia Lawall <julia.lawall@...ia.fr> wrote:
> > > > > > >
> > > > > > > > Your system is calling the polling mode and not the default
> > > > > > > > cpuidle_idle_call() ? This could explain why I don't see such problem
> > > > > > > > on my system which doesn't have polling
> > > > > > > >
> > > > > > > > Are you forcing the use of polling mode ?
> > > > > > > > If yes, could you check that this problem disappears without forcing
> > > > > > > > polling mode ?
> > > > > > >
> > > > > > > I expanded the code in do_idle to:
> > > > > > >
> > > > > > >                 if (cpu_idle_force_poll) { c1++;
> > > > > > >                         tick_nohz_idle_restart_tick();
> > > > > > >                         cpu_idle_poll();
> > > > > > >                 } else if (tick_check_broadcast_expired()) { c2++;
> > > > > > >                         tick_nohz_idle_restart_tick();
> > > > > > >                         cpu_idle_poll();
> > > > > > >                 } else { c3++;
> > > > > > >                         cpuidle_idle_call();
> > > > > > >                 }
> > > > > > >
> > > > > > > Later, I have:
> > > > > > >
> > > > > > >         trace_printk("force poll: %d: c1: %d, c2: %d, c3: %d\n",cpu_idle_force_poll, c1, c2, c3);
> > > > > > >         flush_smp_call_function_queue();
> > > > > > >         schedule_idle();
> > > > > > >
> > > > > > > force poll, c1 and c2 are always 0, and c3 is always some non-zero value.
> > > > > > > Sometimes small (often 1), and sometimes large (304 or 305).
> > > > > > >
> > > > > > > So I don't think it's calling cpu_idle_poll().
> > > > > >
> > > > > > I agree that something else
> > > > > >
> > > > > > >
> > > > > > > x86 has TIF_POLLING_NRFLAG defined to be a non zero value, which I think
> > > > > > > is sufficient to cause the issue.
> > > > > >
> > > > > > Could you trace trace_sched_wake_idle_without_ipi() ans csd traces as well ?
> > > > > > I don't understand what set need_resched() in your case; having in
> > > > > > mind that I don't see the problem on my Arm systems and IIRC Peter
> > > > > > said that he didn't face the problem on his x86 system.
> > > > >
> > > > > TIF_POLLING_NRFLAG doesn't seem to be defined on Arm.
> > > > >
> > > > > Peter said that he didn't see the problem, but perhaps that was just
> > > > > random.  It requires a NUMA move to occur.  I make 20 runs to be sure to
> > > > > see the problem at least once.  But another machine might behave
> > > > > differently.
> > > > >
> > > > > I believe the call chain is:
> > > > >
> > > > > scheduler_tick
> > > > >   trigger_load_balance
> > > > >     nohz_balancer_kick
> > > > >       kick_ilb
> > > > >         smp_call_function_single_async
> > > > >           generic_exec_single
> > > > >             __smp_call_single_queue
> > > > >               send_call_function_single_ipi
> > > > >                 call_function_single_prep_ipi
> > > > >                   set_nr_if_polling <====== sets need_resched
> > > > >
> > > > > I'll make a trace to reverify that.
> > > >
> > > > This is what I see at a tick, which corresponds to the call chain shown
> > > > above:
> > > >
> > > >           bt.B.x-4184  [046]   466.410605: bputs:                scheduler_tick: calling trigger_load_balance
> > > >           bt.B.x-4184  [046]   466.410605: bputs:                trigger_load_balance: calling nohz_balancer_kick
> > > >           bt.B.x-4184  [046]   466.410605: bputs:                trigger_load_balance: calling kick_ilb
> > > >           bt.B.x-4184  [046]   466.410607: bprint:               trigger_load_balance: calling smp_call_function_single_async 22
> > > >           bt.B.x-4184  [046]   466.410607: bputs:                smp_call_function_single_async: calling generic_exec_single
> > > >           bt.B.x-4184  [046]   466.410607: bputs:                generic_exec_single: calling __smp_call_single_queue
> > > >           bt.B.x-4184  [046]   466.410608: bputs:                __smp_call_single_queue: calling send_call_function_single_ipi
> > > >           bt.B.x-4184  [046]   466.410608: bputs:                __smp_call_single_queue: calling call_function_single_prep_ipi
> > > >           bt.B.x-4184  [046]   466.410608: bputs:                call_function_single_prep_ipi: calling set_nr_if_polling
> > > >           bt.B.x-4184  [046]   466.410609: sched_wake_idle_without_ipi: cpu=22
> > >
> > > I don't know if you have made progress on this in the meantime.
> >
> > Not really.  Basically after do_idle, there is the call to
> > flush_smp_call_function_queue that invokes the deposited functions, which
> > in our case is at best going to raise a softirq, and the call to schedule.
> > Raising a softirq doesn't happen because of the check for need_resched.
> > But even if that test were removed, it would still not be useful because
> > there would be the ksoftirqd running on the idle core that would eliminate
> > the imbalance between the sockets.  Maybe there could be some way to call
> > run_rebalance_domains directly from nohz_csd_func, since
> > run_rebalance_domains doesn't use its argument, but at the moment
> > run_rebalance_domains is not visible to nohz_csd_func.
>
> All this happen because we don't use an ipi, it should not use
> ksoftirqd with ipi
>
> >
> > >
> > > Regarding the trace above, do you know if anything happens on CPU22
> > > just before the scheduler tried to kick the ILB on it ?
> >
> > I don't think so.  It's idle.
>
> Ok, so if it is idle for a while , I mean nothing happened on it, not
> even spurious irq, It should have cleared its TIF_POLLING_NRFLAG
>
> I would be good to trace the selected idle state
>
> >
> > > Have you found why TIF_POLLING_NRFLAG seems to be always set when the
> > > kick_ilb happens ? It should be cleared once entering the idle state.
> >
> > Actually, I don't think it is always set.  It switches back and forth
> > between two cases.  I will look for the traces that show that.
> >
> > > Could you check your cpuidle driver ?
> >
> > Check what specifically?
>
> $ cat /sys/devices/system/cpu/cpuidle/current_driver
> $ cat /sys/devices/system/cpu/cpuidle/current_governor

intel_idle and menu

julia

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ