linux-kernel - Re: [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <221f6dadb6d8ce06f30a24baaa2777e90d75b130.camel@redhat.com>
Date: Mon, 26 Jan 2026 17:26:47 -0500
From: Radu Rendec <rrendec@...hat.com>
To: Jon Hunter <jonathanh@...dia.com>, Thomas Gleixner <tglx@...nel.org>, 
 Manivannan Sadhasivam
	 <mani@...nel.org>
Cc: Daniel Tsai <danielsftsai@...gle.com>, Marek Behún	
 <kabel@...nel.org>, Krishna Chaitanya Chundru <quic_krichai@...cinc.com>, 
 Bjorn Helgaas <bhelgaas@...gle.com>, Rob Herring <robh@...nel.org>,
 Krzysztof Wilczyński	 <kwilczynski@...nel.org>, Lorenzo
 Pieralisi <lpieralisi@...nel.org>, Jingoo Han	 <jingoohan1@...il.com>,
 Brian Masney <bmasney@...hat.com>, Eric Chanudet	 <echanude@...hat.com>,
 Alessandro Carminati <acarmina@...hat.com>, Jared Kangas	
 <jkangas@...hat.com>, linux-pci@...r.kernel.org,
 linux-kernel@...r.kernel.org,  "linux-tegra@...r.kernel.org"	
 <linux-tegra@...r.kernel.org>
Subject: Re: [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support

Hi Jon,

On Mon, 2026-01-26 at 22:07 +0000, Jon Hunter wrote:
> On 26/01/2026 07:59, Thomas Gleixner wrote:
> > On Thu, Jan 22 2026 at 18:31, Radu Rendec wrote:
> > > The CPUs are taken offline one by one, starting with CPU 7. The code in
> > > question runs on the dying CPU, and with hardware interrupts disabled
> > > on all CPUs. The (simplified) call stack looks like this:
> > > 
> > > irq_migrate_all_off_this_cpu
> > >    for_each_active_irq
> > >      migrate_one_irq
> > >        irq_do_set_affinity
> > >          irq_chip_redirect_set_affinity (via chip->irq_set_affinity)
> > > 
> > > The debug patch I gave you adds:
> > >   * a printk to irq_chip_redirect_set_affinity (which is very small)
> > >   * a printk at the beginning of migrate_one_irq
> > > 
> > > Also, the call to irq_do_set_affinity is almost the last thing that
> > > happens in migrate_one_irq, and that for_each_active_irq loop is quite
> > > small too. So, there isn't much happening between the printk in
> > > irq_chip_redirect_set_affinity for the msi irq (which we do see in the
> > > log) and the printk in migrate_one_irq for the next irq (which we don't
> > > see).
> > 
> > This doesn't make any sense at all. irq_chip_redirect_set_affinity() is
> > only accessing interrupt descriptor associated memory and the new
> > redirection CPU is the same as the previous one as the mask changes from
> > 0xff to 0x7f and therefore cpumask_first() yields 0 in both cases.
> > 
> > According to the provided dmesg, this happens on linux-next.
> > 
> > Jon, can you please validate that this happens as well on
> > 
> >       git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/msi
> 
> 
> I tried this branch and I see suspend failing with that branch too. If I 
> revert this change on top of your branch or -next, I don't see any 
> problems.

The closest hardware I have access to is Jetson Xavier NX, and you
already mentioned you couldn't reproduce the issue there (and it looks
like I can't even get a hold of that board anyway). So I'm going to ask
you to test a few more things for me.

Can you please apply the patch below on top of the previous one I sent?
The suspect is the spinlock lock in irq_migrate_all_off_this_cpu(),
although I can't think of any reason why it shouldn't be free. But I
don't have any better idea, and I would like to narrow down the spot
where hotplug gets stuck.

diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index d8c62547f9d06..69c44da68e3a9 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -178,9 +178,11 @@ void irq_migrate_all_off_this_cpu(void)
 	for_each_active_irq(irq) {
 		bool affinity_broken;
 
+		pr_info("%s: irq %u\n", __func__, irq);
 		desc = irq_to_desc(irq);
 		scoped_guard(raw_spinlock, &desc->lock) {
 			affinity_broken = migrate_one_irq(desc);
+			pr_info("%s: migrate_one_irq -> %u\n", __func__, affinity_broken);
 			if (affinity_broken && desc->affinity_notify)
 				irq_affinity_schedule_notify_work(desc);
 		}

-- 
Thanks,
Radu