[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d9b89d5342e4268ab908791b403b5eca61aabb5c.camel@redhat.com>
Date: Tue, 07 Oct 2025 18:46:57 -0400
From: Radu Rendec <rrendec@...hat.com>
To: Thomas Gleixner <tglx@...utronix.de>, Manivannan Sadhasivam
<mani@...nel.org>
Cc: Daniel Tsai <danielsftsai@...gle.com>, Marek Behún
<kabel@...nel.org>, Krishna Chaitanya Chundru <quic_krichai@...cinc.com>,
Bjorn Helgaas <bhelgaas@...gle.com>, Rob Herring <robh@...nel.org>,
Krzysztof Wilczyński <kwilczynski@...nel.org>, Lorenzo
Pieralisi <lpieralisi@...nel.org>, Jingoo Han <jingoohan1@...il.com>,
Brian Masney <bmasney@...hat.com>, Eric Chanudet <echanude@...hat.com>,
Alessandro Carminati <acarmina@...hat.com>, Jared Kangas
<jkangas@...hat.com>, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/3] genirq: Add interrupt redirection infrastructure
On Tue, 2025-10-07 at 22:04 +0200, Thomas Gleixner wrote:
> On Mon, Oct 06 2025 at 18:38, Radu Rendec wrote:
> > +
> > +static bool demux_redirect_remote(struct irq_desc *desc)
> > +{
> > +#ifdef CONFIG_SMP
>
> Please have a function and a stub for the !SMP case.
Will do.
> > + const struct cpumask *m = irq_data_get_effective_affinity_mask(&desc->irq_data);
> > + unsigned int target_cpu = READ_ONCE(desc->redirect.fallback_cpu);
>
> what means fallback_cpu in this context? That's confusing at best.
Please see below.
> > + if (!cpumask_test_cpu(smp_processor_id(), m)) {
>
> Why cpumask_test?
>
> if (target != smp_processor_id()
>
> should be good enough and understandable :)
This is where I deviated from your initial implementation, and I tried
to explain it in the cover letter (text reproduced below).
| Instead of keeping track of the parent interrupt's affinity setting (or
| rather the first CPU in its affinity mask) and attempting to pick the
| same CPU for the child (as the target CPU) if possible, I just check if
| the child handler fires on a CPU that's part of its affinity mask (which
| is already stored anyway). As an optimization for the case when the
| current CPU is *not* part of the mask and the handler needs to be
| redirected, I pre-calculate and store the first CPU in the mask, at the
| time when the child affinity is set. In my opinion, this is simpler and
| cleaner, at the expense of a cpumask_test_cpu() call on the fast path,
| because:
| - It no longer needs to keep track of the parent interrupt's affinity
| setting.
| - If the parent interrupt can run on more than one CPU, the child can
| also run on any of those CPUs without being redirected (in case the
| child's affinity mask is the same as the parent's or a superset).
Let me provide an example.
- parent affinity is set to 0,1,2,3
- child affinity is set to 2,3
- parent (hardware) interrupt runs on cpu 3
In the original implementation, the child target would be pre-calculated
as cpu 2, and therefore in the scenario above, the child would be
redirected to cpu 2. But in reality there's no need to redirect because
cpu 3 is part of the child's affinity mask.
Now, to answer your previous question, "fallback_cpu" means the cpu
that we redirect to in case the current cpu is not part of the child's
affinity mask.
> > + /* Protect against shutdown */
> > + if (desc->action)
> > + irq_work_queue_on(&desc->redirect.work, target_cpu);
>
> Can you please document why this is correct vs. CPU hotplug (especially unplug)?
>
> I think it is, but I didn't look too carefully.
Sure. To be honest, left this unchanged from your original version and
didn't give it much thought. I'll look closer and document it in the
next version.
> > +/**
> > + * generic_handle_demux_domain_irq - Invoke the handler for a hardware interrupt
> > + * of a demultiplexing domain.
> > + * @domain: The domain where to perform the lookup
> > + * @hwirq: The hardware interrupt number to convert to a logical one
> > + *
> > + * Returns: True on success, or false if lookup has failed
> > + */
> > +bool generic_handle_demux_domain_irq(struct irq_domain *domain, unsigned int hwirq)
> > +{
> > + struct irq_desc *desc = irq_resolve_mapping(domain, hwirq);
> > +
> > + if (unlikely(!desc))
> > + return false;
> > +
> > + scoped_guard(raw_spinlock, &desc->lock) {
> > + if (desc->irq_data.chip->irq_pre_redirect)
> > + desc->irq_data.chip->irq_pre_redirect(&desc->irq_data);
>
> I'd rather see that in the redirect function aboive.
What? The scoped_guard() and calling irq_pre_redirect()? Then, if
demux_redirect_remote() becomes a stub when CONFIG_SMP is not defined,
it means irq_pre_redirect() will not be called either, even if it's set
in struct irq_chip.
Now, I don't see any reason why irq_pre_redirect would be set for the
non-SMP case, and in fact it isn't if you look at (currently) the only
implementation, which is dwc PCI (patch 3). Redirection just doesn't
make sense if you have only one cpu.
Perhaps (struct irq_chip).irq_pre_redirect should not even exist (as a
field in the structure) unless CONFIG_SMP is defined?
> > + if (demux_redirect_remote(desc))
> > + return true;
> > + }
> > + return !handle_irq_desc(desc);
> > +}
> > +EXPORT_SYMBOL_GPL(generic_handle_demux_domain_irq);
> > +
> > #endif
> >
> > /* Dynamic interrupt handling */
> > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> > index c94837382037e..ed8f8b2667b0b 100644
> > --- a/kernel/irq/manage.c
> > +++ b/kernel/irq/manage.c
> > @@ -35,6 +35,16 @@ static int __init setup_forced_irqthreads(char *arg)
> > early_param("threadirqs", setup_forced_irqthreads);
> > #endif
> >
> > +#ifdef CONFIG_SMP
> > +static inline void synchronize_irqwork(struct irq_desc *desc)
> > +{
> > + /* Synchronize pending or on the fly redirect work */
> > + irq_work_sync(&desc->redirect.work);
> > +}
> > +#else
> > +static inline void synchronize_irqwork(struct irq_desc *desc) { }
> > +#endif
> > +
> > static int __irq_get_irqchip_state(struct irq_data *d, enum irqchip_irq_state which, bool *state);
> >
> > static void __synchronize_hardirq(struct irq_desc *desc, bool sync_chip)
> > @@ -43,6 +53,8 @@ static void __synchronize_hardirq(struct irq_desc *desc, bool sync_chip)
> > bool inprogress;
> >
> > do {
> > + synchronize_irqwork(desc);
>
> That can't work. irq_work_sync() requires interrupts and preemption
> enabled. But __synchronize_hardirq() can be invoked from interrupt or
> preemption disabled context.
>
> And it's not required at all beacuse that's not any different from a
> hardware device interrupt. Either it is already handled (IRQ_INPROGRESS)
> on some other CPU or not. That code can't anticipiate that there is a
> interrupt somewhere on the flight in the system and not yet raised and
> handled in a CPU.
>
> Though you want to invoke it in __synchronize_irq() _before_ invoking
> __synchronize_hardirq().
Same comment as before: I left this unchanged from your original
version and didn't give it much thought. This is definitely one of
those "back to the design board" cases. Thanks for the guidance, and I
will give it more thought and address it in the next version.
--
Thanks a lot for reviewing!
Best regards,
Radu
Powered by blists - more mailing lists