[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ecn23q6e.ffs@tglx>
Date: Tue, 03 Feb 2026 14:35:53 +0100
From: Thomas Gleixner <tglx@...nel.org>
To: Yingjun Ni <yingjun.ni@...cv-computing.com>, anup@...infault.org,
pjw@...nel.org, palmer@...belt.com, aou@...s.berkeley.edu, alex@...ti.fr
Cc: linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
yingjun.ni@...cv-computing.com
Subject: Re: [PATCH] irqchip/riscv-imsic: Fix irq migration failure issue
when cpu hotplug.
On Tue, Feb 03 2026 at 16:02, Yingjun Ni wrote:
> Add a null pointer check for irq_write_msi_msg to fix NULL pointer
> dereference issue when migrating irq.
>
> Modify the return value of imsic_irq_set_affinity to let the subdomain
> PCI-MSIX migrate the irq to a new cpu when cpu hotplug.
>
> Don't set vec->move_next in imsic_vector_move_update when the cpu is
> offline, because it will never be cleared.
You completely fail to explain the actual problem and the root
cause. See
https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog
> drivers/irqchip/irq-riscv-imsic-platform.c | 8 ++++++--
> drivers/irqchip/irq-riscv-imsic-state.c | 5 +++++
> 2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/irqchip/irq-riscv-imsic-platform.c b/drivers/irqchip/irq-riscv-imsic-platform.c
> index 643c8e459611..131e4f2b5431 100644
> --- a/drivers/irqchip/irq-riscv-imsic-platform.c
> +++ b/drivers/irqchip/irq-riscv-imsic-platform.c
> @@ -93,9 +93,13 @@ static void imsic_irq_compose_msg(struct irq_data *d, struct msi_msg *msg)
> static void imsic_msi_update_msg(struct irq_data *d, struct imsic_vector *vec)
> {
> struct msi_msg msg = { };
> + struct irq_chip *irq_chip = irq_data_get_irq_chip(d);
> +
> + if (!irq_chip->irq_write_msi_msg)
> + return;
I have no idea how this ever worked. The irq_data pointer belongs to the
IMSIC base domain, which definitely does not have a irq_write_msi_msg()
callback and never can have one.
The write message callback is always implemented by the top most domain,
in this case the PCI/MSI[x] per device domain.
So this code is simply broken and your NULL pointer check just makes it
differently broken.
> imsic_irq_compose_vector_msg(vec, &msg);
> - irq_data_get_irq_chip(d)->irq_write_msi_msg(d, &msg);
> + irq_chip->irq_write_msi_msg(d, &msg);
> }
>
> static int imsic_irq_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
> @@ -173,7 +177,7 @@ static int imsic_irq_set_affinity(struct irq_data *d, const struct cpumask *mask
> /* Move state of the old vector to the new vector */
> imsic_vector_move(old_vec, new_vec);
>
> - return IRQ_SET_MASK_OK_DONE;
> + return IRQ_SET_MASK_OK;
Have you actually looked at the consequences of this change?
> }
>
> static void imsic_irq_force_complete_move(struct irq_data *d)
> diff --git a/drivers/irqchip/irq-riscv-imsic-state.c b/drivers/irqchip/irq-riscv-imsic-state.c
> index b6cebfee9461..cd1bf9516878 100644
> --- a/drivers/irqchip/irq-riscv-imsic-state.c
> +++ b/drivers/irqchip/irq-riscv-imsic-state.c
> @@ -362,6 +362,10 @@ static bool imsic_vector_move_update(struct imsic_local_priv *lpriv,
> /* Update enable and move details */
> enabled = READ_ONCE(vec->enable);
> WRITE_ONCE(vec->enable, new_enable);
> +
> + if (!cpu_online(vec->cpu) && is_old_vec)
> + goto out;
This is definitely not correct as this should still cleanup software
state, no?
Thanks,
tglx
Powered by blists - more mailing lists