[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <re3qxwkm3lu7o77kyfswwennqvtpyonlj4zajt5eu7z5zwkosr@mwacqq6bpbk4>
Date: Thu, 24 Jul 2025 09:34:30 +0800
From: Inochi Amaoto <inochiama@...il.com>
To: Thomas Gleixner <tglx@...utronix.de>,
Inochi Amaoto <inochiama@...il.com>, Chen Wang <unicorn_wang@...look.com>
Cc: linux-kernel@...r.kernel.org, Anup Patel <anup@...infault.org>,
Paul Walmsley <paul.walmsley@...ive.com>, Samuel Holland <samuel.holland@...ive.com>,
Marc Zyngier <maz@...nel.org>, Nam Cao <namcao@...utronix.de>
Subject: Re: Affinity setting problem for emulated MSI on PLIC
On Thu, Jul 24, 2025 at 12:50:05AM +0200, Thomas Gleixner wrote:
> Cc'ing more people as this goes way beyond the SG204x driver
>
> On Wed, Jul 23 2025 at 06:45, Inochi Amaoto wrote:
> > SG2044 and SG2042 has a msi controller that converts the PLIC interrupt
> > to a MSI one. It works at the most case, but failed when probing NVME.
> > The driver complains "nvme nvme0: I/O tag XXX (XXX) QID XX timeout,
> > completion polled". After some test, I found this is caused by some
> > broken interrupt, which is disable on the underlying PLIC chip after
> > setting affinity. As the MSI chip does not have a enable function,
> > irq_startup only calls irq_unmask. This make the underlying interrupt
> > at PLIC is not enabled.
>
> So how did that stuff ever work?
>
> > I have done a hack by changing the mask/unmask to disable/enable and
> > setting MSI_FLAG_PCI_MSI_MASK_PARENT to solve this and it works.
>
> That flag should have been set from the very beginning - I missed it
> when merging the original SG2042 driver. :(
>
> It needs that flag too. Otherwise the chip's irq_mask/unmask functions
> are never invoked from the PCI layer.
>
> Aside of that SG2042 suffers from exactly the same enable problem as
> SG2044 because both rely on the sifive-plic driver.
>
> May I ask the obvious question:
>
> How did this obviously disfunctional driver gain Tested-by and other
> relevant tags?
>
I think the SG2042 pci driver does not support affinity setting, so it
is ignored. But the detail thing I will cc Chen Wang. I guess he can give
some details.
For SG2044, I have tested at old version and it worked when submitting.
And I guess it is because the commit [1], which remove the irq_set_affinity.
[1] https://lore.kernel.org/r/20240723132958.41320-6-marek.vasut+renesas@mailbox.org
IIRC, the worked version I tested has no affinity set and all irqs
are routed to 0, which is different from the behavior now. Another
mysterious things is that NVME is affected. In fact, I have tested
various PCIe device, include RX550, some PCIe to USB/SATA cards and
Mellanox CX341 and X550-T2, all of the them are functional.
> > But I wonder whether there is something better to solve this problem?
> > (The hack I have done is at the end of mail)
>
> It's a horrible hack as you know already :)
>
> > --- a/drivers/irqchip/irq-sg2042-msi.c
> > +++ b/drivers/irqchip/irq-sg2042-msi.c
> > @@ -94,6 +94,20 @@ static const struct irq_chip sg2042_msi_middle_irq_chip = {
> > .irq_compose_msi_msg = sg2042_msi_irq_compose_msi_msg,
> > };
> >
> > +/*
> > + * As PLIC can only apply affinity when enabling, so always call enable
> > + * when unmasking interrupt.
>
> This comment is misleading at best.
>
> The point is that PLIC needs two things to enable an interrupt:
>
> 1) Enable the routing bit for the target CPU
>
> 2) Unmask the interrupt by fiddling with the priority register
>
> The PLIC driver does #1 and #2 in the irq_enable() callback, but only #2
> in the irq_unmask() callback. That's obviously done to avoid fiddling
> with the routing bit and the related lock for every mask/unmask
> operation.
>
Right, this is the thing I should update.
> > +static void sg2044_msi_irq_mask(struct irq_data *d)
> > +{
> > + irq_chip_disable_parent(d);
>
> This is a blatant violation of the interrupt hierarchy and you are well
> aware of it.
>
> There is a reason why there are these different callbacks...
>
Yeah, I know. I guess it needs irq_shutdown for this. It is just
a dirty hack. And I am happy to be told I can do something generic.
I am pretty feared to do that....
> > +static void sg2044_msi_irq_unmask(struct irq_data *d)
> > +{
> > + irq_chip_enable_parent(d);
> > +}
> > +
> > static void sg2044_msi_irq_ack(struct irq_data *d)
> > {
> > struct sg204x_msi_chipdata *data = irq_data_get_irq_chip_data(d);
> > @@ -115,8 +129,8 @@ static void sg2044_msi_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *m
> > static struct irq_chip sg2044_msi_middle_irq_chip = {
> > .name = "SG2044 MSI",
> > .irq_ack = sg2044_msi_irq_ack,
> > - .irq_mask = irq_chip_mask_parent,
> > - .irq_unmask = irq_chip_unmask_parent,
> > + .irq_mask = sg2044_msi_irq_mask,
> > + .irq_unmask = sg2044_msi_irq_unmask,
>
> Again. Why only for SG2044?
>
> Fact is that the PLIC requires irq_enable() to be invoked to get the
> interrupt started, which is never invoked as the PCI/MSI chips do not
> populate that callback. So SG2042 _must_ be affected as well, no?
>
I guess it is. This patch is more like something prototype. The things
about for SG2042 I have not confirmed from Wang. And I think he will
give some detail for it.
> PCI/MSI interrupts won't ever populate irq_[en|dis]able() because that
> prevents the core code from lazy masking the interrupt on disable_irq()
> in order to spare an enable_irq() when no interrupt is raised by the
> device between the disable and the enable operation.
>
> That stays a fact unless someone rewrites the historicaly grown logic of
> that stuff from ground up, which requires to audit every single irqchip
> driver in tree. At some point we probably have to do that, but my
> copious spare time is definitely not going to be wasted on that.
>
> As this is only related to the startup/shutdown phase of an interrupt,
> the obvious solution is to populate the irq_startup() and irq_shutdown()
> callbacks in the PCI layer.
>
This is pretty good and I have considered adding startup/shutdown
callback too, it is OK for me to take this.
> That requires:
>
> - making the PCI/MSI code populate those callbacks with straight
> forward helpers in the core interrupt code for startup/shutdown and
> only invoke those helpers when MSI_FLAG_PCI_MSI_MASK_PARENT is set.
>
> That's also the proper mechanism to solve Marc's problem in
>
> https://lore.kernel.org/all/20250517103011.2573288-1-maz@kernel.org
>
> as it allows to conditionally overwrite the irq_[un]mask()
> callbacks of the PCI/MSI[-X] chip with irq_chip_[un]mask_parent().
> Obviously not based on MSI_FLAG_PCI_MSI_MASK_PARENT. That needs a
> new flag to opt in, but that's a minor detail.
>
> This has no impact on any other driver AFAICT, but that obviously
> needs to be looked at with a less sleep deprived brain.
>
> - the PLIC driver to switch the irq_[en|dis]able() callbacks over to
> irq_shutdown/startup(), which needs some extra care to handle
> interrupt affinity settings, which is an interesting issue by
> itself. See the large comment I left in the patch below.
>
> - to update the SG204x driver accordingly
>
> See below for a completely uncompiled and therefore untested patch,
> which needs to be split up into a gazillion of patches obviously. As it
> is uncompiled and untested, you can keep the bugs for yourself and fix
> them.
>
This is a pretty detailed instruction for me. Thanks
> There is a big fat comment in the PLIC driver related section, which is
> just a notepad to dump my findings and thoughts for posterity. That part
> was broken before. It is really scary and needs some real scrunity
> beyond the "works for me" and "looks good" mindset. I don't care much as
> I wasted enough time on this already, but I have to admit that looking
> at this mess inspired me to find a non-horrible solution for the problem
> Marc is trying to solve (not that I tried really hard before...).
>
> Obviously none of that PLIC/SG204x stuff is going near any tree anytime
> soon. The SG204x driver has been broken since it was merged half a year
> ago without anyone noticing, so there is no rush to fix it. Same for the
> PLIC muck.
>
As it is something pretty niche, many things related to RISC-V
are not well tested. :/
Anyway, I will try my best to find a solution for this. It is a good
thing for me to learn something about kernel.
Thanks,
Inochi
> The generic and PCI/MSI[-X] changes have a value on their own and are
> sane enough to go forward independent of that, once I come around to
> split them out with proper change logs unless someone beats me to it.
>
> Thanks,
>
> tglx
> ---
> --- a/drivers/irqchip/irq-sg2042-msi.c
> +++ b/drivers/irqchip/irq-sg2042-msi.c
> @@ -86,6 +86,8 @@ static void sg2042_msi_irq_compose_msi_m
> static const struct irq_chip sg2042_msi_middle_irq_chip = {
> .name = "SG2042 MSI",
> .irq_ack = sg2042_msi_irq_ack,
> + .irq_startup = irq_chip_startup_parent,
> + .irq_shutdown = irq_chip_shutdown_parent,
> .irq_mask = irq_chip_mask_parent,
> .irq_unmask = irq_chip_unmask_parent,
> #ifdef CONFIG_SMP
> @@ -115,6 +117,8 @@ static void sg2044_msi_irq_compose_msi_m
> static struct irq_chip sg2044_msi_middle_irq_chip = {
> .name = "SG2044 MSI",
> .irq_ack = sg2044_msi_irq_ack,
> + .irq_startup = irq_chip_startup_parent,
> + .irq_shutdown = irq_chip_shutdown_parent,
> .irq_mask = irq_chip_mask_parent,
> .irq_unmask = irq_chip_unmask_parent,
> #ifdef CONFIG_SMP
> @@ -186,7 +190,8 @@ static const struct irq_domain_ops sg204
> };
>
> #define SG2042_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS | \
> - MSI_FLAG_USE_DEF_CHIP_OPS)
> + MSI_FLAG_USE_DEF_CHIP_OPS | \
> + MSI_FLAG_PCI_MSI_MASK_PARENT)
>
> #define SG2042_MSI_FLAGS_SUPPORTED MSI_GENERIC_FLAGS_MASK
>
> @@ -201,7 +206,8 @@ static const struct msi_parent_ops sg204
> };
>
> #define SG2044_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS | \
> - MSI_FLAG_USE_DEF_CHIP_OPS)
> + MSI_FLAG_USE_DEF_CHIP_OPS | \
> + MSI_FLAG_PCI_MSI_MASK_PARENT)
>
> #define SG2044_MSI_FLAGS_SUPPORTED (MSI_GENERIC_FLAGS_MASK | \
> MSI_FLAG_PCI_MSIX)
> --- a/drivers/irqchip/irq-sifive-plic.c
> +++ b/drivers/irqchip/irq-sifive-plic.c
> @@ -140,14 +140,16 @@ static void plic_irq_mask(struct irq_dat
> writel(0, priv->regs + PRIORITY_BASE + d->hwirq * PRIORITY_PER_ID);
> }
>
> -static void plic_irq_enable(struct irq_data *d)
> +static unsigned int plic_irq_startup(struct irq_data *d)
> {
> plic_irq_toggle(irq_data_get_effective_affinity_mask(d), d, 1);
> plic_irq_unmask(d);
> + return 0;
> }
>
> -static void plic_irq_disable(struct irq_data *d)
> +static void plic_irq_shutdown(struct irq_data *d)
> {
> + plic_irq_mask(d);
> plic_irq_toggle(irq_data_get_effective_affinity_mask(d), d, 0);
> }
>
> @@ -179,12 +181,57 @@ static int plic_set_affinity(struct irq_
> if (cpu >= nr_cpu_ids)
> return -EINVAL;
>
> - plic_irq_disable(d);
> + /* Invalidate the original routing entry */
> + plic_irq_toggle(irq_data_get_effective_affinity_mask(d), d, 0);
>
> irq_data_update_effective_affinity(d, cpumask_of(cpu));
>
> - if (!irqd_irq_disabled(d))
> - plic_irq_enable(d);
> + /*
> + * Update the routing entry for the new target CPU.
> + *
> + * The below comment is not for a final patch, it's just
> + * documentation for this combo patch, which obviously needs to be
> + * split up into a gazillion of patches. So I use this as a notepad
> + * in order to not forget the gory details, which are changelog
> + * material :)
> + *
> + * This is on purpose not bug compatible with the current
> + * implementation, which unmasked the interrupt unconditionally due
> + * to:
> + *
> + * if (!irqd_irq_disabled(d))
> + * plic_irq_enable(d);
> + *
> + * which is broken as it unconditionally unmasks a masked
> + * interrupt. This was introduced with commit
> + *
> + * 6b1e0651e9ce ("irqchip/sifive-plic: Unmask interrupt in plic_irq_enable()")
> + *
> + * which in turn tried to fix the problem, which was introduced by
> + * commit
> + *
> + * a1706a1c5062 ("irqchip/sifive-plic: Separate the enable and mask operations")
> + *
> + * 6b1e0651e9ce is probably harmless, but I'm too tired and can't
> + * be bothered to validate it. See below.
> + *
> + * This all needs real scrutiny and has to be validated against
> + * both specification and implementations.
> + *
> + * AFAICT, toggling unconditionally is the right thing to do, but I
> + * might be completely wrong as ususal. For me the two mentioned
> + * commits above seem to be contradictionary, but my tired brain
> + * can't decode it right now and therefore I leave that for the
> + * PLIC wizards as _their_ homework. Not that I have high
> + * expectations on that given the trail of Tested-by and other
> + * tags. "Works for me" by some definition of "works" seems to be
> + * the prevailing principle here. "Correctness first" is obviously
> + * overrated as usual up to the point where the real great hacks
> + * come along to "fix" the resulting sh*t. Unfortunately that costs
> + * the time of people, who have not been responsible for the
> + * problems in the first place...
> + */
> + plic_irq_toggle(cpumask_of(cpu), d, 1);
>
> return IRQ_SET_MASK_OK_DONE;
> }
> @@ -192,8 +239,8 @@ static int plic_set_affinity(struct irq_
>
> static struct irq_chip plic_edge_chip = {
> .name = "SiFive PLIC",
> - .irq_enable = plic_irq_enable,
> - .irq_disable = plic_irq_disable,
> + .irq_startup = plic_irq_startup,
> + .irq_shutdown = plic_irq_shutdown,
> .irq_ack = plic_irq_eoi,
> .irq_mask = plic_irq_mask,
> .irq_unmask = plic_irq_unmask,
> @@ -207,8 +254,8 @@ static struct irq_chip plic_edge_chip =
>
> static struct irq_chip plic_chip = {
> .name = "SiFive PLIC",
> - .irq_enable = plic_irq_enable,
> - .irq_disable = plic_irq_disable,
> + .irq_startup = plic_irq_startup,
> + .irq_shutdown = plic_irq_shutdown,
> .irq_mask = plic_irq_mask,
> .irq_unmask = plic_irq_unmask,
> .irq_eoi = plic_irq_eoi,
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -148,6 +148,23 @@ static void pci_device_domain_set_desc(m
> arg->hwirq = desc->msi_index;
> }
>
> +static inline void cond_shutdown_parent(struct irq_data *data)
> +{
> + struct msi_domain_info *info = data->domain->host_data;
> +
> + if (unlikely(info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT))
> + irq_chip_shutdown_parent(data);
> +}
> +
> +static inline unsigned int cond_startup_parent(struct irq_data *data)
> +{
> + struct msi_domain_info *info = data->domain->host_data;
> +
> + if (unlikely(info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT))
> + return irq_chip_startup_parent(data);
> + return 0;
> +}
> +
> static __always_inline void cond_mask_parent(struct irq_data *data)
> {
> struct msi_domain_info *info = data->domain->host_data;
> @@ -164,6 +181,23 @@ static __always_inline void cond_unmask_
> irq_chip_unmask_parent(data);
> }
>
> +static void pci_irq_shutdown_msi(struct irq_data *data)
> +{
> + struct msi_desc *desc = irq_data_get_msi_desc(data);
> +
> + pci_msi_mask(desc, BIT(data->irq - desc->irq));
> + cond_shutdown_parent(data);
> +}
> +
> +static unsigned int pci_irq_startup_msi(struct irq_data *data)
> +{
> + struct msi_desc *desc = irq_data_get_msi_desc(data);
> + unsigned int ret = cond_startup_parent(data);
> +
> + pci_msi_unmask(desc, BIT(data->irq - desc->irq));
> + return ret;
> +}
> +
> static void pci_irq_mask_msi(struct irq_data *data)
> {
> struct msi_desc *desc = irq_data_get_msi_desc(data);
> @@ -194,6 +228,8 @@ static void pci_irq_unmask_msi(struct ir
> static const struct msi_domain_template pci_msi_template = {
> .chip = {
> .name = "PCI-MSI",
> + .irq_startup = pci_irq_startup_msi,
> + .irq_shutdown = pci_irq_shutdown_msi,
> .irq_mask = pci_irq_mask_msi,
> .irq_unmask = pci_irq_unmask_msi,
> .irq_write_msi_msg = pci_msi_domain_write_msg,
> @@ -210,6 +246,20 @@ static const struct msi_domain_template
> },
> };
>
> +static void pci_irq_shutdown_msix(struct irq_data *data)
> +{
> + pci_msix_mask(irq_data_get_msi_desc(data));
> + cond_shutdown_parent(data);
> +}
> +
> +static unsigned int pci_irq_startup_msix(struct irq_data *data)
> +{
> + unsigned int ret = cond_startup_parent(data);
> +
> + pci_msix_unmask(irq_data_get_msi_desc(data));
> + return ret;
> +}
> +
> static void pci_irq_mask_msix(struct irq_data *data)
> {
> pci_msix_mask(irq_data_get_msi_desc(data));
> @@ -233,6 +283,8 @@ static void pci_msix_prepare_desc(struct
> static const struct msi_domain_template pci_msix_template = {
> .chip = {
> .name = "PCI-MSIX",
> + .irq_startup = pci_irq_startup_msix,
> + .irq_shutdown = pci_irq_shutdown_msix,
> .irq_mask = pci_irq_mask_msix,
> .irq_unmask = pci_irq_unmask_msix,
> .irq_write_msi_msg = pci_msi_domain_write_msg,
> --- a/include/linux/irq.h
> +++ b/include/linux/irq.h
> @@ -669,6 +669,8 @@ extern int irq_chip_set_parent_state(str
> extern int irq_chip_get_parent_state(struct irq_data *data,
> enum irqchip_irq_state which,
> bool *state);
> +extern unsigned int irq_chip_startup_parent(struct irq_data *data);
> +extern void irq_chip_shutdown_parent(struct irq_data *data);
> extern void irq_chip_enable_parent(struct irq_data *data);
> extern void irq_chip_disable_parent(struct irq_data *data);
> extern void irq_chip_ack_parent(struct irq_data *data);
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -1254,9 +1254,47 @@ int irq_chip_get_parent_state(struct irq
> EXPORT_SYMBOL_GPL(irq_chip_get_parent_state);
>
> /**
> - * irq_chip_enable_parent - Enable the parent interrupt (defaults to unmask if
> - * NULL)
> + * irq_chip_startup_parent - Startup the parent interrupt
> * @data: Pointer to interrupt specific data
> + *
> + * Invokes the irq_startup() callback of the parent if available or falls
> + * back to irq_chip_enable_parent().
> + */
> +unsigned int irq_chip_startup_parent(struct irq_data *data)
> +{
> + struct irq_data *parent = data->parent_data;
> +
> + if (parent->chip->irq_startup)
> + return parent->chip->irq_startup(parent);
> + irq_chip_enable_parent(data);
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(irq_chip_startup_parent);
> +
> +/**
> + * irq_chip_shutdown_parent - Shutdown the parent interrupt
> + * @data: Pointer to interrupt specific data
> + *
> + * Invokes the irq_shutdown() callback of the parent if available or falls
> + * back to irq_chip_disable_parent().
> + */
> +void irq_chip_shutdown_parent(struct irq_data *data)
> +{
> + struct irq_data *parent = data->parent_data;
> +
> + if (parent->chip->irq_shutdown)
> + parent->chip->irq_shutdown(parent);
> + else
> + irq_chip_disable_parent(data);
> +}
> +EXPORT_SYMBOL_GPL(irq_chip_shutdown_parent);
> +
> +/**
> + * irq_chip_enable_parent - Enable the parent interrupt
> + * @data: Pointer to interrupt specific data
> + *
> + * Invokes the irq_enable() callback of the parent if available or falls
> + * back to the irq_unmask() callback.
> */
> void irq_chip_enable_parent(struct irq_data *data)
> {
> @@ -1269,9 +1307,11 @@ void irq_chip_enable_parent(struct irq_d
> EXPORT_SYMBOL_GPL(irq_chip_enable_parent);
>
> /**
> - * irq_chip_disable_parent - Disable the parent interrupt (defaults to mask if
> - * NULL)
> + * irq_chip_disable_parent - Disable the parent interrupt
> * @data: Pointer to interrupt specific data
> + *
> + * Invokes the irq_disable() callback of the parent if available or falls
> + * back to the irq_mask() callback.
> */
> void irq_chip_disable_parent(struct irq_data *data)
> {
>
Powered by blists - more mailing lists