[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231018200826.GA1371652@bhelgaas>
Date: Wed, 18 Oct 2023 15:08:26 -0500
From: Bjorn Helgaas <helgaas@...nel.org>
To: Ido Schimmel <idosch@...dia.com>
Cc: netdev@...r.kernel.org, linux-pci@...r.kernel.org, davem@...emloft.net,
kuba@...nel.org, pabeni@...hat.com, edumazet@...gle.com,
bhelgaas@...gle.com, alex.williamson@...hat.com, lukas@...ner.de,
petrm@...dia.com, jiri@...dia.com, mlxsw@...dia.com
Subject: Re: [RFC PATCH net-next 05/12] PCI: Add device-specific reset for
NVIDIA Spectrum devices
On Tue, Oct 17, 2023 at 10:42:50AM +0300, Ido Schimmel wrote:
> The PCIe specification defines two methods to trigger a hot reset across
> a link: Bus reset and link disablement (r6.0.1, sec 7.1, sec 6.6.1). In
> the first method, the Secondary Bus Reset (SBR) bit in the Bridge
> Control Register of the Downstream Port is asserted for at least 1ms
> (r6.0.1, sec 7.5.1.3.13). In the second method, the Link Disable bit in
> the Link Control Register of the Downstream Port is asserted and then
> cleared to disable and enable the link (r6.0.1, sec 7.5.3.7).
>
> While the two methods are identical from the perspective of the
> Downstream device, they are different as far as the host is concerned.
> In the first method, the Link Training and Status State Machine (LTSSM)
> of the Downstream Port is expected to be in the Hot Reset state as long
> as the SBR bit is asserted. In the second method, the LTSSM of the
> Downstream Port is expected to be in the Disabled state as long as the
> Link Disable bit is asserted.
>
> This above difference is of importance because the specification
> requires the LTTSM to exit from the Hot Reset state to the Detect state
> within a 2ms timeout (r6.0.1, sec 4.2.7.11).
I don't read 4.2.7.11 quite that way. Here's the text (from r6.0):
• Lanes that were directed by a higher Layer to initiate Hot
Reset:
◦ All Lanes in the configured Link transmit TS1 Ordered Sets
with the Hot Reset bit asserted and the configured Link and
Lane numbers.
◦ If two consecutive TS1 Ordered Sets are received on any
Lane with the Hot Reset bit asserted and configured Link
and Lane numbers, then:
▪ LinkUp = 0b (False)
▪ If no higher Layer is directing the Physical Layer to
remain in Hot Reset, the next state is Detect
▪ Otherwise, all Lanes in the configured Link continue to
transmit TS1 Ordered Sets with the Hot Reset bit asserted
and the configured Link and Lane numbers.
◦ Otherwise, after a 2 ms timeout next state is Detect.
I assume that SBR being set constitutes a "higher Layer directing the
Physical Layer to remain in Hot Reset," so I would read this as saying
the LTSSM stays in Hot Reset as long as SBR is set. Then, *after* a
2 ms timeout (not *within* 2 ms), the next state is Detect.
> NVIDIA Spectrum devices cannot guarantee it and a host enforcing
> such a behavior might fail to communicate with the device after
> issuing a Secondary Bus Reset.
I don't quite follow this. What behavior is the host enforcing here?
I guess you're doing an SBR, and the Spectrum device doesn't respond
as expected afterwards?
It looks like pci_reset_secondary_bus() asserts SBR for at least
2 ms. Then pci_bridge_wait_for_secondary_bus() should wait before
accessing the device, but maybe we don't wait long enough?
I guess this ends up back at d3cold_delay as suggested by Lukas.
> With the link disablement method, the host can leave the link
> disabled for enough time to allow the device to undergo a hot reset
> and reach the Detect state. After enabling the link, the host will
> exit from the Disabled state to Detect state (r6.0.1, sec 4.2.7.9)
> and observe that the device is already in the Detect state.
>
> The PCI core only implements the first method, which might not work with
> NVIDIA Spectrum devices on certain hosts, as explained above. Therefore,
> implement the link disablement method as a device-specific method for
> NVIDIA Spectrum devices. Specifically, disable the link, wait for 500ms,
> enable the link and then wait for the device to become accessible.
>
> Signed-off-by: Ido Schimmel <idosch@...dia.com>
> ---
> drivers/pci/quirks.c | 29 +++++++++++++++++++++++++++++
> 1 file changed, 29 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 23f6bd2184e2..a6e308bb934c 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4182,6 +4182,31 @@ static int reset_hinic_vf_dev(struct pci_dev *pdev, bool probe)
> return 0;
> }
>
> +#define PCI_DEVICE_ID_MELLANOX_SPECTRUM 0xcb84
> +#define PCI_DEVICE_ID_MELLANOX_SPECTRUM2 0xcf6c
> +#define PCI_DEVICE_ID_MELLANOX_SPECTRUM3 0xcf70
> +#define PCI_DEVICE_ID_MELLANOX_SPECTRUM4 0xcf80
> +
> +static int reset_mlx(struct pci_dev *pdev, bool probe)
> +{
> + struct pci_dev *bridge = pdev->bus->self;
> +
> + if (probe)
> + return 0;
> +
> + /*
> + * Disable the link on the Downstream port in order to trigger a hot
> + * reset in the Downstream device. Wait for 500ms before enabling the
> + * link so that the firmware on the device will have enough time to
> + * transition the Upstream port to the Detect state.
> + */
> + pcie_capability_set_word(bridge, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_LD);
> + msleep(500);
> + pcie_capability_clear_word(bridge, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_LD);
> +
> + return pci_bridge_wait_for_secondary_bus(bridge, "link toggle");
> +}
> +
> static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
> { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF,
> reset_intel_82599_sfp_virtfn },
> @@ -4197,6 +4222,10 @@ static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
> reset_chelsio_generic_dev },
> { PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HINIC_VF,
> reset_hinic_vf_dev },
> + { PCI_VENDOR_ID_MELLANOX, PCI_DEVICE_ID_MELLANOX_SPECTRUM, reset_mlx },
> + { PCI_VENDOR_ID_MELLANOX, PCI_DEVICE_ID_MELLANOX_SPECTRUM2, reset_mlx },
> + { PCI_VENDOR_ID_MELLANOX, PCI_DEVICE_ID_MELLANOX_SPECTRUM3, reset_mlx },
> + { PCI_VENDOR_ID_MELLANOX, PCI_DEVICE_ID_MELLANOX_SPECTRUM4, reset_mlx },
> { 0 }
> };
>
> --
> 2.40.1
>
Powered by blists - more mailing lists