[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z6NoKq6RQWNZC3s2@U-2FWC9VHC-2323.local>
Date: Wed, 5 Feb 2025 21:31:22 +0800
From: Feng Tang <feng.tang@...ux.alibaba.com>
To: Lukas Wunner <lukas@...ner.de>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>,
Jonathan Cameron <Jonthan.Cameron@...wei.com>,
ilpo.jarvinen@...ux.intel.com, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] PCI: Disable PCIE hotplug interrupts early when msi
is disabled
On Wed, Feb 05, 2025 at 02:31:56PM +0800, Feng Tang wrote:
> On Tue, Feb 04, 2025 at 10:14:10AM +0100, Lukas Wunner wrote:
> > On Tue, Feb 04, 2025 at 01:37:58PM +0800, Feng Tang wrote:
> > > There was a irq storm bug when testing "pci=nomsi" case, and the root
> > > cause is: 'nomsi' will disable MSI and let devices and root ports use
> > > legacy INTX inerrupt, and likely make several devices/ports share one
> > > interrupt. In the failure case, BIOS doesn't disable the PCIE hotplug
> > > interrupts, and actually asserts the command-complete interrupt.
> > > As MSI is disabled, ACPI initialization code will not enumerate root
> > > port's PCIE hotplug capability, and pciehp service driver wont' be
> > > enabled for the root port to handle that interrupt, later on when it is
> > > shared and enabled by other device driver like NVME or NIC, the "nobody
> > > care irq storm" happens.
> > >
> > > So disable the pcie hotplug CCIE/HPIE interrupt in early boot phase when
> > > MSI is not enbaled.
> >
> > So I think this issue should go away if disabling the interrupt
> > by portdrv is no longer conditional on
> >
> > (pcie_ports_native || host->native_pcie_hotplug)
> >
> > like I've just proposed here:
> >
> > https://lore.kernel.org/r/Z6HYuBDP6uvE1Sf4@wunner.de/
> >
> > ... in which case this patch won't be necessary. Can you confirm that?
>
> Thanks for the suggestion! I will try to get the platform for test,
> and report back.
I haven't got the platform, but I recalled something, that disabling HP
interrupts inside get_port_device_capability()/portdrv_probe() got called
after the nvme_probe(), so it may still cause the irq storm due to:
* pcie root port's hotplug interrupt asserted
* the interrupt is shared with NVME and other device
* those device drivers enable the interrupt line early before portdrv's
probe()
That's why we tried to put the disabling early in PCI initialization code.
Thanks,
Feng
> As for the change,
> + if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
> + pcie_capability_clear_word(dev, PCI_EXP_SLTCTL,
> + PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_HPIE);
>
> The CONFIG_HOTPLUG_PCI_PCIE is always enabled on our platform and many
> distros, I guess the check needs to be removed, which sees the 1 second
> waiting again, and need the waiting logic in 1/2 patch?
>
> Thanks,
> Feng
Powered by blists - more mailing lists