lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231018200826.GA1371652@bhelgaas>
Date: Wed, 18 Oct 2023 15:08:26 -0500
From: Bjorn Helgaas <helgaas@...nel.org>
To: Ido Schimmel <idosch@...dia.com>
Cc: netdev@...r.kernel.org, linux-pci@...r.kernel.org, davem@...emloft.net,
	kuba@...nel.org, pabeni@...hat.com, edumazet@...gle.com,
	bhelgaas@...gle.com, alex.williamson@...hat.com, lukas@...ner.de,
	petrm@...dia.com, jiri@...dia.com, mlxsw@...dia.com
Subject: Re: [RFC PATCH net-next 05/12] PCI: Add device-specific reset for
 NVIDIA Spectrum devices

On Tue, Oct 17, 2023 at 10:42:50AM +0300, Ido Schimmel wrote:
> The PCIe specification defines two methods to trigger a hot reset across
> a link: Bus reset and link disablement (r6.0.1, sec 7.1, sec 6.6.1). In
> the first method, the Secondary Bus Reset (SBR) bit in the Bridge
> Control Register of the Downstream Port is asserted for at least 1ms
> (r6.0.1, sec 7.5.1.3.13). In the second method, the Link Disable bit in
> the Link Control Register of the Downstream Port is asserted and then
> cleared to disable and enable the link (r6.0.1, sec 7.5.3.7).
> 
> While the two methods are identical from the perspective of the
> Downstream device, they are different as far as the host is concerned.
> In the first method, the Link Training and Status State Machine (LTSSM)
> of the Downstream Port is expected to be in the Hot Reset state as long
> as the SBR bit is asserted. In the second method, the LTSSM of the
> Downstream Port is expected to be in the Disabled state as long as the
> Link Disable bit is asserted.
> 
> This above difference is of importance because the specification
> requires the LTTSM to exit from the Hot Reset state to the Detect state
> within a 2ms timeout (r6.0.1, sec 4.2.7.11).

I don't read 4.2.7.11 quite that way.  Here's the text (from r6.0):

  • Lanes that were directed by a higher Layer to initiate Hot
    Reset:

    ◦ All Lanes in the configured Link transmit TS1 Ordered Sets
      with the Hot Reset bit asserted and the configured Link and
      Lane numbers.

    ◦ If two consecutive TS1 Ordered Sets are received on any
      Lane with the Hot Reset bit asserted and configured Link
      and Lane numbers, then:

      ▪ LinkUp = 0b (False)

      ▪ If no higher Layer is directing the Physical Layer to
        remain in Hot Reset, the next state is Detect

      ▪ Otherwise, all Lanes in the configured Link continue to
	transmit TS1 Ordered Sets with the Hot Reset bit asserted
	and the configured Link and Lane numbers.

    ◦ Otherwise, after a 2 ms timeout next state is Detect.

I assume that SBR being set constitutes a "higher Layer directing the
Physical Layer to remain in Hot Reset," so I would read this as saying
the LTSSM stays in Hot Reset as long as SBR is set.  Then, *after* a
2 ms timeout (not *within* 2 ms), the next state is Detect.

> NVIDIA Spectrum devices cannot guarantee it and a host enforcing
> such a behavior might fail to communicate with the device after
> issuing a Secondary Bus Reset.

I don't quite follow this.  What behavior is the host enforcing here?
I guess you're doing an SBR, and the Spectrum device doesn't respond
as expected afterwards?

It looks like pci_reset_secondary_bus() asserts SBR for at least
2 ms.  Then pci_bridge_wait_for_secondary_bus() should wait before
accessing the device, but maybe we don't wait long enough?

I guess this ends up back at d3cold_delay as suggested by Lukas.

> With the link disablement method, the host can leave the link
> disabled for enough time to allow the device to undergo a hot reset
> and reach the Detect state. After enabling the link, the host will
> exit from the Disabled state to Detect state (r6.0.1, sec 4.2.7.9)
> and observe that the device is already in the Detect state.
> 
> The PCI core only implements the first method, which might not work with
> NVIDIA Spectrum devices on certain hosts, as explained above. Therefore,
> implement the link disablement method as a device-specific method for
> NVIDIA Spectrum devices. Specifically, disable the link, wait for 500ms,
> enable the link and then wait for the device to become accessible.
> 
> Signed-off-by: Ido Schimmel <idosch@...dia.com>
> ---
>  drivers/pci/quirks.c | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 23f6bd2184e2..a6e308bb934c 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4182,6 +4182,31 @@ static int reset_hinic_vf_dev(struct pci_dev *pdev, bool probe)
>  	return 0;
>  }
>  
> +#define PCI_DEVICE_ID_MELLANOX_SPECTRUM		0xcb84
> +#define PCI_DEVICE_ID_MELLANOX_SPECTRUM2	0xcf6c
> +#define PCI_DEVICE_ID_MELLANOX_SPECTRUM3	0xcf70
> +#define PCI_DEVICE_ID_MELLANOX_SPECTRUM4	0xcf80
> +
> +static int reset_mlx(struct pci_dev *pdev, bool probe)
> +{
> +	struct pci_dev *bridge = pdev->bus->self;
> +
> +	if (probe)
> +		return 0;
> +
> +	/*
> +	 * Disable the link on the Downstream port in order to trigger a hot
> +	 * reset in the Downstream device. Wait for 500ms before enabling the
> +	 * link so that the firmware on the device will have enough time to
> +	 * transition the Upstream port to the Detect state.
> +	 */
> +	pcie_capability_set_word(bridge, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_LD);
> +	msleep(500);
> +	pcie_capability_clear_word(bridge, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_LD);
> +
> +	return pci_bridge_wait_for_secondary_bus(bridge, "link toggle");
> +}
> +
>  static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
>  	{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF,
>  		 reset_intel_82599_sfp_virtfn },
> @@ -4197,6 +4222,10 @@ static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
>  		reset_chelsio_generic_dev },
>  	{ PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HINIC_VF,
>  		reset_hinic_vf_dev },
> +	{ PCI_VENDOR_ID_MELLANOX, PCI_DEVICE_ID_MELLANOX_SPECTRUM, reset_mlx },
> +	{ PCI_VENDOR_ID_MELLANOX, PCI_DEVICE_ID_MELLANOX_SPECTRUM2, reset_mlx },
> +	{ PCI_VENDOR_ID_MELLANOX, PCI_DEVICE_ID_MELLANOX_SPECTRUM3, reset_mlx },
> +	{ PCI_VENDOR_ID_MELLANOX, PCI_DEVICE_ID_MELLANOX_SPECTRUM4, reset_mlx },
>  	{ 0 }
>  };
>  
> -- 
> 2.40.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ