lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87leil2r7v.ffs@tglx>
Date:   Fri, 21 Apr 2023 17:24:36 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Frederic Weisbecker <frederic@...nel.org>,
        Huacai Chen <chenhuacai@...nel.org>,
        WANG Xuerui <kernel@...0n.name>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Anna-Maria Behnsen <anna-maria@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: Loongson (and other $ARCHs?) idle VS timer enqueue

On Fri, Apr 21 2023 at 14:36, Frederic Weisbecker wrote:
> I'm looking at the __arch_cpu_idle() implementation in Loongarch
> and I'm wondering about the rollback code. I don't understand well
> that code but with the help from PeterZ I might have seen something,
> so tell me if I'm wrong: when an interrupt happens within
> __arch_cpu_idle(), handle_vint() rolls back the return value to the
> beginning of __arch_cpu_idle(), so that TIF_NEED_RESCHED is checked
> again. Is that correct?
>
> Because if an interrupt fires while in __arch_cpu_idle(), that IRQ
> might enqueue a new timer and that new timer needs to be reprogrammed
> from the main idle loop and just checking TIF_NEED_RESCHED doesn't
> tell about that information.

The check for TIF_NEED_RESCHED as loop termination condition is simply
wrong. The architecture is not to supposed to loop in arch_cpu_idle().

That loop is from Linux 0.9 days. Seems MIPS::__r4k_wait() and
loongarch, which copied that muck are still stuck in the 1990'ies.

It has to return when an interrupt brings it out of the "idle wait"
instruction.

The special case are mwait() alike mechanisms which also return when a
monitored cacheline is written to. x86 uses that to spare the reseched
IPI as MWAIT will return when TIF_NEED_RESCHED is set by a remote CPU.

> More generally IRQs must _not_ be re-enabled between cpuidle_select()
> (or just tick_nohz_idle_stop_tick() if no cpuidle support) and the
> last halting ASM instruction. If that happens there must be
> a mechanism to cope with that and make sure we return to the main
> idle loop.

No. arch_cpu_idle() can safely reenable interrupts when the "wait"
instruction requires that. It has then to disable interrupts before
returning.

x86 default_idle() does: STI; HLT; CLI; That's perfectly fine because
the effect of STI is delayed to the HLT instruction boundary.

> Another way to cope with this would be to have:
>
> #define TIF_IDLE_TIMER	 ...
> #define TIF_IDLE_EXIT	 (TIF_NEED_RESCHED | TIF_IDLE_TIMER)

There is absolutely no need for this. arch_cpu_idle() has to return
after an interrupt, period. If MIPS/loongarch cannot do that then they
can have their private interrupt counter in that magic rollback ASM to
check for. But we really don't need a TIF flag which makes the (hr)timer
enqueue path more complex.

> I'm trying to find an automated way to debug this kind of issue but
> it's not easy...

It's far from trivial because you'd need correlation between the
interrupt entry and the enter to and return from arch_cpu_idle().

I fear manual inspection is the main tool here :(

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ