[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201130075651.GJ2414@hirez.programming.kicks-ass.net>
Date: Mon, 30 Nov 2020 08:56:51 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Thomas Gleixner <tglx@...utronix.de>,
"Paul E. McKenney" <paulmck@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
the arch/x86 maintainers <x86@...nel.org>
Subject: Re: [GIT pull] locking/urgent for v5.10-rc6
On Sun, Nov 29, 2020 at 11:31:41AM -0800, Linus Torvalds wrote:
> On Sun, Nov 29, 2020 at 5:38 AM Thomas Gleixner <tglx@...utronix.de> wrote:
> >
> > Yet two more places which invoke tracing from RCU disabled regions in the
> > idle path. Similar to the entry path the low level idle functions have to
> > be non-instrumentable.
>
> This really seems less than optimal.
>
> In particular, lookie here:
>
> > @@ -94,9 +94,35 @@ void __cpuidle default_idle_call(void)
> >
> > trace_cpu_idle(1, smp_processor_id());
> > stop_critical_timings();
> > +
> > + /*
> > + * arch_cpu_idle() is supposed to enable IRQs, however
> > + * we can't do that because of RCU and tracing.
> > + *
> > + * Trace IRQs enable here, then switch off RCU, and have
> > + * arch_cpu_idle() use raw_local_irq_enable(). Note that
> > + * rcu_idle_enter() relies on lockdep IRQ state, so switch that
> > + * last -- this is very similar to the entry code.
> > + */
> > + trace_hardirqs_on_prepare();
> > + lockdep_hardirqs_on_prepare(_THIS_IP_);
> > rcu_idle_enter();
> > + lockdep_hardirqs_on(_THIS_IP_);
> > +
> > arch_cpu_idle();
> > +
> > + /*
> > + * OK, so IRQs are enabled here, but RCU needs them disabled to
> > + * turn itself back on.. funny thing is that disabling IRQs
> > + * will cause tracing, which needs RCU. Jump through hoops to
> > + * make it 'work'.
> > + */
> > + raw_local_irq_disable();
> > + lockdep_hardirqs_off(_THIS_IP_);
> > rcu_idle_exit();
> > + lockdep_hardirqs_on(_THIS_IP_);
> > + raw_local_irq_enable();
> > +
> > start_critical_timings();
> > trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
> > }
>
> And look at what the code generation for the idle exit path is when
> lockdep isn't even on.
Agreed.
The idea was to flip all of arch_cpu_idle() to not enable interrupts.
This is suboptimal for things like x86 where arch_cpu_idle() is
basically STI;HLT, but x86 isn't likely to actually use this code path
anyway, given all the various cpuidle drivers it has.
Many of the other archs are now doing things like arm's:
wfi();raw_local_irq_enable().
Doing that tree-wide interrupt-state flip was something I didn't want to
do at this late a stage, the chanse of messing that up is just too high.
After that I need to go look at flipping cpuidle, which is even more
'interesting'. cpuidle_enter() has the exact same semantics, and this is
the code path that x86 actually uses, and here it's inconsitent at best.
Powered by blists - more mailing lists