[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZEK+IeTYsauHLozy@lothringen>
Date: Fri, 21 Apr 2023 18:47:29 +0200
From: Frederic Weisbecker <frederic@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Huacai Chen <chenhuacai@...nel.org>,
WANG Xuerui <kernel@...0n.name>,
Thomas Gleixner <tglx@...utronix.de>,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
Anna-Maria Behnsen <anna-maria@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: Loongson (and other $ARCHs?) idle VS timer enqueue
On Fri, Apr 21, 2023 at 04:24:46PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 21, 2023 at 02:36:52PM +0200, Frederic Weisbecker wrote:
> > Hi,
> >
> > I'm looking at the __arch_cpu_idle() implementation in Loongarch
> > and I'm wondering about the rollback code. I don't understand well
> > that code but with the help from PeterZ I might have seen something,
> > so tell me if I'm wrong: when an interrupt happens within
> > __arch_cpu_idle(), handle_vint() rolls back the return value to the
> > beginning of __arch_cpu_idle(), so that TIF_NEED_RESCHED is checked
> > again. Is that correct?
>
> Loongson copied this crap from MIPS, so they are direct affected too.
Right.
>
> > Because if an interrupt fires while in __arch_cpu_idle(), that IRQ
> > might enqueue a new timer and that new timer needs to be reprogrammed
> > from the main idle loop and just checking TIF_NEED_RESCHED doesn't
> > tell about that information.
>
> Notably; this is only relevant to NOHZ, right?
Indeed.
> > And set that from the timer enqueue in idle time and check TIF_IDLE_EXIT
> > on idle callback. It depends how many architectures are concerned by this.
> > All I know so far is:
>
> The alternative is changing kernel/entry/common.c:irqentry_exit() to add
> a nohz callback next to ct_irq_exit(), and have that reprogram the timer
> if/when we're in NOHZ mode.
We used to do that but Rafael rewrote the thing a few years ago in order for
the cpuidle governor to know about the next timer event as a heuristic to
predict the best c-state, and actually decide if it's worth stopping the
tick.
So cpuidle_select() eventually calls tick_nohz_get_sleep_length() in the
beginning of the idle loop to know the next timer event (without stopping the
tick yet), on top of that and other informations, tick is stopped or not
(cf: stop_tick argument in cpuidle_select()).
If an IRQ wakes up the CPU and queues a timer, we need to go through that
whole process again, otherwise we shortcut cpuidle C-state update.
> *HOWEVER*
>
> intel_idle_irq() is affected -- except that we only (normally) use that
> for very shallow idle states and it won't interact with NOHZ (because we
> only disable the tick for deep idle states).
Well I don't know, that doesn't look comfortable... :)
Also why does it need to enable IRQs if ecx=1 ?
> > * Need to check all other archs
> >
> > I'm trying to find an automated way to debug this kind of issue but it's not
> > easy...
>
> Yeah, too much arch code :/ Easiest might be to check if our idea of
> where the timer should be and the hardware agree on IRQ entry or so --
> *any* IRQ. That will miss a lot of cases, but at least it's something.
Hmm, not sure I understand what you're suggesting...
Thanks.
Powered by blists - more mailing lists