[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3903a508c15e7a75b6d637c8523c3bae13d6a7af.camel@amazon.com>
Date: Thu, 1 Jun 2023 07:24:48 +0000
From: "Gowans, James" <jgowans@...zon.com>
To: "maz@...nel.org" <maz@...nel.org>
CC: "tglx@...utronix.de" <tglx@...utronix.de>,
"Raslan, KarimAllah" <karahmed@...zon.com>,
"liaochang1@...wei.com" <liaochang1@...wei.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"zouyipeng@...wei.com" <zouyipeng@...wei.com>,
"chris.zjh@...wei.com" <chris.zjh@...wei.com>
Subject: Re: [PATCH 2/2] genirq: fasteoi resends interrupt on concurrent
invoke
On Wed, 2023-05-31 at 08:00 +0100, Marc Zyngier wrote:
> > Generally it should not be possible for the next interrupt to arrive
> > while the previous handler is still running: the next interrupt should
> > only arrive after the EOI message has been sent and the previous handler
> > has returned.
>
> There is no such message with LPIs. I pointed that out previously.
Arg, thanks, I'll re-word this to:
"Generally it should not be possible for the next interrupt to arrive
while the previous handler is still running: the CPU will not preempt an
interrupt with another from the same source or same priority."
I hope that's more accurate?
> > This issue was observed specifically on an arm64 system with a GIC-v3
> > handling MSIs; GIC-v3 uses the handle_fasteoi_irq handler. The issue is
> > that the global ITS is responsible for affinity but does not know
> > whether interrupts are pending/running, only the CPU-local redistributor
> > handles the EOI. Hence when the affinity is changed in the ITS, the new
> > CPU's redistributor does not know that the original CPU is still running
> > the handler.
>
> Similar to your previous patch, you don't explain *why* the interrupt
> gets delivered when it is an LPI, and not for any of the other GICv3
> interrupt types. That's an important point.
Right, you pointed out the issue with this sentence too and I missed
updating it. :-/ How about:
"This issue was observed specifically on an arm64 system with a GIC-v3
handling MSIs; GIC-v3 uses the handle_fasteoi_irq handler. The issue is
that the GIC-v3's physical LPIs do not have a global active state. If LPIs
had an active state, then it would not be be able to be retriggered until
the first CPU had issued a deactivation"
>
> >
> > + /*
> > + * When the race descibed above happens, this will resend the interrupt.
> > + */
> > + if (unlikely(desc->istate & IRQS_PENDING))
> > + check_irq_resend(desc, false);
> > +
> > raw_spin_unlock(&desc->lock);
> > return;
> > out:
>
> While I'm glad that you eventually decided to use the resend mechanism
> instead of spinning on the "old" CPU, I still think imposing this
> behaviour on all users without any discrimination is wrong.
>
> Look at what it does if an interrupt is a wake-up source. You'd
> pointlessly requeue the interrupt (bonus points if the irqchip doesn't
> provide a HW-based retrigger mechanism).
>
> I still maintain that this change should only be applied for the
> particular interrupts that *require* it, and not as a blanket change
> affecting everything under the sun. I have proposed such a change in
> the past, feel free to use it or roll your own.
Thanks for the example of where this blanket functionality wouldn't be
desired - I'll re-work this to introduce and use
the IRQD_RESEND_WHEN_IN_PROGRESS flag as you originally suggested.
Just one more thing before I post V3: are you okay with doing the resend
here *after* the handler finished running, and using the IRQ_PENDING flag
to know to resend it? Or would you like it to be resent in
the !irq_may_run(desc) block as you suggested?
I have a slight preference to do it after, only when we know it's ready to
be run again, and hence not needed to modify check_irq_resend() to cater
for multiple retries.
JG
Powered by blists - more mailing lists