[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZELADFhjWR2Swn3l@lothringen>
Date: Fri, 21 Apr 2023 18:55:40 +0200
From: Frederic Weisbecker <frederic@...nel.org>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Huacai Chen <chenhuacai@...nel.org>,
WANG Xuerui <kernel@...0n.name>,
Peter Zijlstra <peterz@...radead.org>,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
Anna-Maria Behnsen <anna-maria@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: Loongson (and other $ARCHs?) idle VS timer enqueue
On Fri, Apr 21, 2023 at 05:24:36PM +0200, Thomas Gleixner wrote:
> On Fri, Apr 21 2023 at 14:36, Frederic Weisbecker wrote:
> The check for TIF_NEED_RESCHED as loop termination condition is simply
> wrong. The architecture is not to supposed to loop in arch_cpu_idle().
>
> That loop is from Linux 0.9 days. Seems MIPS::__r4k_wait() and
> loongarch, which copied that muck are still stuck in the 1990'ies.
>
> It has to return when an interrupt brings it out of the "idle wait"
> instruction.
>
> The special case are mwait() alike mechanisms which also return when a
> monitored cacheline is written to. x86 uses that to spare the reseched
> IPI as MWAIT will return when TIF_NEED_RESCHED is set by a remote CPU.
Right.
>
> > More generally IRQs must _not_ be re-enabled between cpuidle_select()
> > (or just tick_nohz_idle_stop_tick() if no cpuidle support) and the
> > last halting ASM instruction. If that happens there must be
> > a mechanism to cope with that and make sure we return to the main
> > idle loop.
>
> No. arch_cpu_idle() can safely reenable interrupts when the "wait"
> instruction requires that. It has then to disable interrupts before
> returning.
>
> x86 default_idle() does: STI; HLT; CLI; That's perfectly fine because
> the effect of STI is delayed to the HLT instruction boundary.
Right, I implicitly included sti;mwait and sti;hlt
The point is that if interrupts are enabled too early before the
idling instruction then we are screwed.
>
> > Another way to cope with this would be to have:
> >
> > #define TIF_IDLE_TIMER ...
> > #define TIF_IDLE_EXIT (TIF_NEED_RESCHED | TIF_IDLE_TIMER)
>
> There is absolutely no need for this. arch_cpu_idle() has to return
> after an interrupt, period. If MIPS/loongarch cannot do that then they
> can have their private interrupt counter in that magic rollback ASM to
> check for. But we really don't need a TIF flag which makes the (hr)timer
> enqueue path more complex.
Then I'm relieved :) (well sort-of, given the risk for an accident somewhere
on an arch or a cpuidle driver I may have overlooked).
>
> > I'm trying to find an automated way to debug this kind of issue but
> > it's not easy...
>
> It's far from trivial because you'd need correlation between the
> interrupt entry and the enter to and return from arch_cpu_idle().
>
> I fear manual inspection is the main tool here :(
I thought so :)
I'm already halfway through the architectures, then will come the cpuidle drivers...
Thanks.
Powered by blists - more mailing lists