linux-kernel - Re: Loongson (and other $ARCHs?) idle VS timer enqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZEK+IeTYsauHLozy@lothringen>
Date:   Fri, 21 Apr 2023 18:47:29 +0200
From:   Frederic Weisbecker <frederic@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Huacai Chen <chenhuacai@...nel.org>,
        WANG Xuerui <kernel@...0n.name>,
        Thomas Gleixner <tglx@...utronix.de>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Anna-Maria Behnsen <anna-maria@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: Loongson (and other $ARCHs?) idle VS timer enqueue

On Fri, Apr 21, 2023 at 04:24:46PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 21, 2023 at 02:36:52PM +0200, Frederic Weisbecker wrote:
> > Hi,
> > 
> > I'm looking at the __arch_cpu_idle() implementation in Loongarch
> > and I'm wondering about the rollback code. I don't understand well
> > that code but with the help from PeterZ I might have seen something,
> > so tell me if I'm wrong: when an interrupt happens within
> > __arch_cpu_idle(), handle_vint() rolls back the return value to the
> > beginning of __arch_cpu_idle(), so that TIF_NEED_RESCHED is checked
> > again. Is that correct?
> 
> Loongson copied this crap from MIPS, so they are direct affected too.

Right.

> 
> > Because if an interrupt fires while in __arch_cpu_idle(), that IRQ
> > might enqueue a new timer and that new timer needs to be reprogrammed
> > from the main idle loop and just checking TIF_NEED_RESCHED doesn't
> > tell about that information.
> 
> Notably; this is only relevant to NOHZ, right?

Indeed.

> > And set that from the timer enqueue in idle time and check TIF_IDLE_EXIT
> > on idle callback. It depends how many architectures are concerned by this.
> > All I know so far is:
> 
> The alternative is changing kernel/entry/common.c:irqentry_exit() to add
> a nohz callback next to ct_irq_exit(), and have that reprogram the timer
> if/when we're in NOHZ mode.

We used to do that but Rafael rewrote the thing a few years ago in order for
the cpuidle governor to know about the next timer event as a heuristic to
predict the best c-state, and actually decide if it's worth stopping the
tick.

So cpuidle_select() eventually calls tick_nohz_get_sleep_length() in the
beginning of the idle loop to know the next timer event (without stopping the
tick yet), on top of that and other informations, tick is stopped or not
(cf: stop_tick argument in cpuidle_select()).

If an IRQ wakes up the CPU and queues a timer, we need to go through that
whole process again, otherwise we shortcut cpuidle C-state update.

> *HOWEVER*
> 
> intel_idle_irq() is affected -- except that we only (normally) use that
> for very shallow idle states and it won't interact with NOHZ (because we
> only disable the tick for deep idle states).

Well I don't know, that doesn't look comfortable... :)

Also why does it need to enable IRQs if ecx=1 ?

> > * Need to check all other archs
> > 
> > I'm trying to find an automated way to debug this kind of issue but it's not
> > easy...
> 
> Yeah, too much arch code :/ Easiest might be to check if our idea of
> where the timer should be and the hardware agree on IRQ entry or so --
> *any* IRQ. That will miss a lot of cases, but at least it's something.

Hmm, not sure I understand what you're suggesting...

Thanks.