[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7a845b43-bd8e-6c7d-6bca-2e6f174f671@inria.fr>
Date: Fri, 5 Jan 2024 17:39:12 +0100 (CET)
From: Julia Lawall <julia.lawall@...ia.fr>
To: Vincent Guittot <vincent.guittot@...aro.org>
cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>, Mel Gorman <mgorman@...e.de>,
linux-kernel@...r.kernel.org
Subject: Re: EEVDF and NUMA balancing
On Fri, 5 Jan 2024, Vincent Guittot wrote:
> On Fri, 5 Jan 2024 at 15:51, Julia Lawall <julia.lawall@...ia.fr> wrote:
> >
> > > Your system is calling the polling mode and not the default
> > > cpuidle_idle_call() ? This could explain why I don't see such problem
> > > on my system which doesn't have polling
> > >
> > > Are you forcing the use of polling mode ?
> > > If yes, could you check that this problem disappears without forcing
> > > polling mode ?
> >
> > I expanded the code in do_idle to:
> >
> > if (cpu_idle_force_poll) { c1++;
> > tick_nohz_idle_restart_tick();
> > cpu_idle_poll();
> > } else if (tick_check_broadcast_expired()) { c2++;
> > tick_nohz_idle_restart_tick();
> > cpu_idle_poll();
> > } else { c3++;
> > cpuidle_idle_call();
> > }
> >
> > Later, I have:
> >
> > trace_printk("force poll: %d: c1: %d, c2: %d, c3: %d\n",cpu_idle_force_poll, c1, c2, c3);
> > flush_smp_call_function_queue();
> > schedule_idle();
> >
> > force poll, c1 and c2 are always 0, and c3 is always some non-zero value.
> > Sometimes small (often 1), and sometimes large (304 or 305).
> >
> > So I don't think it's calling cpu_idle_poll().
>
> I agree that something else
>
> >
> > x86 has TIF_POLLING_NRFLAG defined to be a non zero value, which I think
> > is sufficient to cause the issue.
>
> Could you trace trace_sched_wake_idle_without_ipi() ans csd traces as well ?
> I don't understand what set need_resched() in your case; having in
> mind that I don't see the problem on my Arm systems and IIRC Peter
> said that he didn't face the problem on his x86 system.
TIF_POLLING_NRFLAG doesn't seem to be defined on Arm.
Peter said that he didn't see the problem, but perhaps that was just
random. It requires a NUMA move to occur. I make 20 runs to be sure to
see the problem at least once. But another machine might behave
differently.
I believe the call chain is:
scheduler_tick
trigger_load_balance
nohz_balancer_kick
kick_ilb
smp_call_function_single_async
generic_exec_single
__smp_call_single_queue
send_call_function_single_ipi
call_function_single_prep_ipi
set_nr_if_polling <====== sets need_resched
I'll make a trace to reverify that.
julia
Powered by blists - more mailing lists