lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z8FXyVyMyAe4_bI3@U-2FWC9VHC-2323.local>
Date: Fri, 28 Feb 2025 14:29:29 +0800
From: Feng Tang <feng.tang@...ux.alibaba.com>
To: Lukas Wunner <lukas@...ner.de>, rafael@...nel.org,
	Bjorn Helgaas <bhelgaas@...gle.com>
Cc: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@...ux.intel.com>,
	Liguang Zhang <zhangliguang@...ux.alibaba.com>,
	Guanghui Feng <guanghuifeng@...ux.alibaba.com>,
	Markus Elfring <Markus.Elfring@....de>, lkp@...el.com,
	Jonathan Cameron <Jonathan.Cameron@...wei.com>,
	ilpo.jarvinen@...ux.intel.com, linux-pci@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 2/4] PCI/portdrv: Add necessary wait for disabling
 hotplug events

On Tue, Feb 25, 2025 at 12:42:04PM +0800, Feng Tang wrote:
 
> > 
> > > There might be some misunderstaning here :), I responded in
> > > https://lore.kernel.org/lkml/Z6LRAozZm1UfgjqT@U-2FWC9VHC-2323.local/
> > > that your suggestion could solve our issue.
> > 
> > Well, could you test it please?
> 
> I don't have the hardware right now, will try to get from firmware
> developers. But from code logic, your suggestion can surely solve the
> issue unless I still miss something. From bug report (also commit log),
> the first PCIe hotplug command issued is here, and the second command
> comes from pciehp driver. In our kernel config, CONFIG_HOTPLUG_PCI_PCIE=y,
> so the first command won't happen, and all following commands come
> from pciehp driver, which setup its own waiting for command logic.
> 

Hi Lucas,

We just tried the patch on the hardware and initial 5.10 kernel, and
the problem cannot be reproduced, as the first PCIe hotplug command
of disabling CCIE and HPIE was not issued. 

Should I post a new version patch with your suggestion? You analysis
in previous sounded sane to me. Also for the original context, if the
BIOS has enabled the hotplug interrupt, it has been there since OS
boot for quite some time, this solution just affects a very small
time window from here to the loading of pciehp driver.  

Also I would like to separate this patch from the patch dealing the
nomsi irq storm issue. How do you think?

Thanks,
Feng

> > > 
> > > The code comment from 2bd50dd800b5 is:
> > > 
> > > 	/*
> > > 	 * Disable hot-plug interrupts in case they have been
> > > 	 * enabled by the BIOS and the hot-plug service driver
> > > 	 * is not loaded.
> > > 	 */
> > > 
> > > The "is not loaded" has 2 possible meanings:
> > > 1. the pciehp driver is not loaded yet at this point inside
> > >    get_port_device_capability(), and will be loaded later
> > > 2. the pciehp will never be loaded, i.e. CONFIG_HOTPLUG_PCI_PCIE=n 
> > > 
> > > If it's case 2, your suggestion can solve it nicely, but for case 1,
> > > we may have to keep the interrupt disabling.
> > 
> > The pciehp driver cannot be bound to the PCIe port when
> > get_port_device_capability() is running.  Because at that point,
> > portdrv is still figuring out which capabilities the port has and
> > it will subsequently instantiate the port service devices to which
> > the drivers (such as pciehp) will bind.
>  
> Yes, the time window between here and the following initialization of
> pciehp service driver is very small, and your suggestion sounds quite
> safe to me.
> 
> > So in that sense, case 1 cannot be what the code comment is
> > referring to.
> 
> Hi Rafel,
> 
> Could you help to confirm this?
> 
> > 
> > My point is that if CONFIG_HOTPLUG_PCI_PCIE=y, there may indeed be
> > another write to the Slot Control register before the command written
> > by portdrv has been completed.  Because pciehp will write to the
> > register on probe.  But in this case, there shouldn't be a need for
> > portdrv to quiesce the interrupt because pciehp will do that anyway
> > shortly afterwards.
> > 
> > And in the CONFIG_HOTPLUG_PCI_PCIE=n case, pciehp will not quiesce
> > the interrupt, so portdrv has to do that.  I believe that's what
> > the code comment is referring to.  It should be safe to just write
> > to the Slot Control register without waiting for the command to
> > complete because there shouldn't be another Slot Control write
> > afterwards (not by pciehp anyway).
> > 
> > If making the Slot Control write in portdrv conditional on
> > CONFIG_HOTPLUG_PCI_PCIE=n does unexpectedly *not* solve the issue,
> > please try to find out where the second Slot Control write is
> > coming from.  E.g. you could amend pcie_capability_write_word()
> > with something like:
> > 
> > 	if (pos == PCI_EXP_SLTCTL) {
> > 		pci_info(dev, "Writing %04hx SltCtl\n", val);
> > 		dump_stack();
> > 	}
> 
> Thanks for the suggestion, and we added similar debug to figure
> out them.
> 
> Thanks,
> Feng

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ