lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZELADFhjWR2Swn3l@lothringen>
Date:   Fri, 21 Apr 2023 18:55:40 +0200
From:   Frederic Weisbecker <frederic@...nel.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Huacai Chen <chenhuacai@...nel.org>,
        WANG Xuerui <kernel@...0n.name>,
        Peter Zijlstra <peterz@...radead.org>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Anna-Maria Behnsen <anna-maria@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: Loongson (and other $ARCHs?) idle VS timer enqueue

On Fri, Apr 21, 2023 at 05:24:36PM +0200, Thomas Gleixner wrote:
> On Fri, Apr 21 2023 at 14:36, Frederic Weisbecker wrote:
> The check for TIF_NEED_RESCHED as loop termination condition is simply
> wrong. The architecture is not to supposed to loop in arch_cpu_idle().
> 
> That loop is from Linux 0.9 days. Seems MIPS::__r4k_wait() and
> loongarch, which copied that muck are still stuck in the 1990'ies.
> 
> It has to return when an interrupt brings it out of the "idle wait"
> instruction.
> 
> The special case are mwait() alike mechanisms which also return when a
> monitored cacheline is written to. x86 uses that to spare the reseched
> IPI as MWAIT will return when TIF_NEED_RESCHED is set by a remote CPU.

Right.

> 
> > More generally IRQs must _not_ be re-enabled between cpuidle_select()
> > (or just tick_nohz_idle_stop_tick() if no cpuidle support) and the
> > last halting ASM instruction. If that happens there must be
> > a mechanism to cope with that and make sure we return to the main
> > idle loop.
> 
> No. arch_cpu_idle() can safely reenable interrupts when the "wait"
> instruction requires that. It has then to disable interrupts before
> returning.
> 
> x86 default_idle() does: STI; HLT; CLI; That's perfectly fine because
> the effect of STI is delayed to the HLT instruction boundary.

Right, I implicitly included sti;mwait and sti;hlt
The point is that if interrupts are enabled too early before the
idling instruction then we are screwed.

> 
> > Another way to cope with this would be to have:
> >
> > #define TIF_IDLE_TIMER	 ...
> > #define TIF_IDLE_EXIT	 (TIF_NEED_RESCHED | TIF_IDLE_TIMER)
> 
> There is absolutely no need for this. arch_cpu_idle() has to return
> after an interrupt, period. If MIPS/loongarch cannot do that then they
> can have their private interrupt counter in that magic rollback ASM to
> check for. But we really don't need a TIF flag which makes the (hr)timer
> enqueue path more complex.

Then I'm relieved :)  (well sort-of, given the risk for an accident somewhere
on an arch or a cpuidle driver I may have overlooked).

> 
> > I'm trying to find an automated way to debug this kind of issue but
> > it's not easy...
> 
> It's far from trivial because you'd need correlation between the
> interrupt entry and the enter to and return from arch_cpu_idle().
> 
> I fear manual inspection is the main tool here :(

I thought so :)

I'm already halfway through the architectures, then will come the cpuidle drivers...

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ