lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z6QqGRNQ0UQZSKBB@U-2FWC9VHC-2323.local>
Date: Thu, 6 Feb 2025 11:18:49 +0800
From: Feng Tang <feng.tang@...ux.alibaba.com>
To: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@...ux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>,
	Jonathan Cameron <Jonthan.Cameron@...wei.com>,
	ilpo.jarvinen@...ux.intel.com, linux-pci@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] PCI/portdrv: Add necessary delay for disabling
 hotplug events

Hi Sathyanarayanan,

On Wed, Feb 05, 2025 at 10:26:59AM -0800, Sathyanarayanan Kuppuswamy wrote:
> 
> On 2/3/25 9:37 PM, Feng Tang wrote:
> > According to PCIe 6.1 spec, section 6.7.3.2, software need to wait at
> > least 1 second for the command-complete event, before resending the cmd
> > or sending a new cmd.
> > 
> > Currently get_port_device_capability() sends slot control cmd to disable
> > PCIe hotplug interrupts without waiting for its completion and there was
> > real problem reported for the lack of waiting.
> 
> Can you include the error log associated with this issue? What is the
> actual issue you are seeing and in which hardware?

For this one, we don't have specific log, as it was raised by firmware
developer, as in https://lore.kernel.org/lkml/Z6LRAozZm1UfgjqT@U-2FWC9VHC-2323.local/

When handling PCI hotplug problem, they hit issue and found their state
machine corrupted , and back traced to OS. They didn't expect to receive
2 link control commands at almost the same time, which doesn't comply to
pcie spec, and normally the handling of one command will take some time
in BIOS, though not as long as 1 second. The HW is an ARM server.

I will try to add these info to commit log in next version.

> 
> > 
> > Add the necessary wait to comply with PCIe spec. The waiting logic refers
> > existing pcie_poll_cmd().
> > 
> > Signed-off-by: Feng Tang <feng.tang@...ux.alibaba.com>
> > ---
> >   drivers/pci/pci.h          |  2 ++
> >   drivers/pci/pcie/portdrv.c | 33 +++++++++++++++++++++++++++++++--
> >   2 files changed, 33 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > index 01e51db8d285..c1e234d1b81d 100644
> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -759,12 +759,14 @@ static inline void pcie_ecrc_get_policy(char *str) { }
> >   #ifdef CONFIG_PCIEPORTBUS
> >   void pcie_reset_lbms_count(struct pci_dev *port);
> >   int pcie_lbms_count(struct pci_dev *port, unsigned long *val);
> > +void pcie_disable_hp_interrupts_early(struct pci_dev *dev);
> >   #else
> >   static inline void pcie_reset_lbms_count(struct pci_dev *port) {}
> >   static inline int pcie_lbms_count(struct pci_dev *port, unsigned long *val)
> >   {
> >   	return -EOPNOTSUPP;
> >   }
> > +static inline void pcie_disable_hp_interrupts_early(struct pci_dev *dev) {}
> >   #endif
> >   struct pci_dev_reset_methods {
> > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> > index 02e73099bad0..16010973bfe2 100644
> > --- a/drivers/pci/pcie/portdrv.c
> > +++ b/drivers/pci/pcie/portdrv.c
> > @@ -18,6 +18,7 @@
> >   #include <linux/string.h>
> >   #include <linux/slab.h>
> >   #include <linux/aer.h>
> > +#include <linux/delay.h>
> >   #include "../pci.h"
> >   #include "portdrv.h"
> > @@ -205,6 +206,35 @@ static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask)
> >   	return 0;
> >   }
> > +static int pcie_wait_sltctl_cmd_raw(struct pci_dev *pdev)
> > +{
> > +	u16 slot_status;
> > +	/* 1000 ms, according toPCIe spec 6.1, section 6.7.3.2 */
> > +	int timeout = 1000;
> > +
> > +	do {
> > +		pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
> > +		if (slot_status & PCI_EXP_SLTSTA_CC) {
> > +			pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
> > +						   PCI_EXP_SLTSTA_CC);
> > +			return 0;
> > +		}
> > +		msleep(10);
> > +		timeout -= 10;
> > +	} while (timeout);
> > +
> > +	/* Timeout */
> > +	return  -1;
> > +}
> 
> May be this logic can be simplified using readl_poll_timeout()?

Seems this is what exactly I needed :) Many thanks for the suggestion!

- Feng

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ