[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5ecc588d-6397-37e4-3104-a32a639312f0@mellanox.com>
Date: Tue, 14 Jul 2020 13:47:15 +0300
From: Aya Levin <ayal@...lanox.com>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: David Miller <davem@...emloft.net>, kuba@...nel.org,
saeedm@...lanox.com, mkubecek@...e.cz, linux-pci@...r.kernel.org,
netdev@...r.kernel.org, tariqt@...lanox.com,
alexander.h.duyck@...ux.intel.com, Jason Gunthorpe <jgg@...dia.com>
Subject: Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering
On 7/9/2020 2:16 AM, Bjorn Helgaas wrote:
> On Sun, Jul 08, 2040 at 11:22:12AM +0300, Aya Levin wrote:
>> On 7/6/2020 10:49 PM, David Miller wrote:
>>> From: Aya Levin <ayal@...lanox.com>
>>> Date: Mon, 6 Jul 2020 16:00:59 +0300
>>>
>>>> Assuming the discussions with Bjorn will conclude in a well-trusted
>>>> API that ensures relaxed ordering in enabled, I'd still like a method
>>>> to turn off relaxed ordering for performance debugging sake.
>>>> Bjorn highlighted the fact that the PCIe sub system can only offer a
>>>> query method. Even if theoretically a set API will be provided, this
>>>> will not fit a netdev debugging - I wonder if CPU vendors even support
>>>> relaxed ordering set/unset...
>>>> On the driver's side relaxed ordering is an attribute of the mkey and
>>>> should be available for configuration (similar to number of CPU
>>>> vs. number of channels).
>>>> Based on the above, and binding the driver's default relaxed ordering
>>>> to the return value from pcie_relaxed_ordering_enabled(), may I
>>>> continue with previous direction of a private-flag to control the
>>>> client side (my driver) ?
>>>
>>> I don't like this situation at all.
>>>
>>> If RO is so dodgy that it potentially needs to be disabled, that is
>>> going to be an issue not just with networking devices but also with
>>> storage and other device types as well.
>>>
>>> Will every device type have a custom way to disable RO, thus
>>> inconsistently, in order to accomodate this?
>>>
>>> That makes no sense and is a terrible user experience.
>>>
>>> That's why the knob belongs generically in PCI or similar.
>>>
>> Hi Bjorn,
>>
>> Mellanox NIC supports relaxed ordering operation over DMA buffers.
>> However for debug prepossess we must have a chicken bit to disable
>> relaxed ordering on a specific system without effecting others in
>> run-time. In order to meet this requirement, I added a netdev
>> private-flag to ethtool for set RO API.
>>
>> Dave raised a concern regarding embedding relaxed ordering set API
>> per system (networking, storage and others). We need the ability to
>> manage relaxed ordering in a unify manner. Could you please define a
>> PCI sub-system solution to meet this requirement?
>
> I agree, this is definitely a mess. Let me just outline what I think
> we have today and what we're missing.
>
> - On the hardware side, device use of Relaxed Ordering is controlled
> by the Enable Relaxed Ordering bit in the PCIe Device Control
> register (or the PCI-X Command register). If set, the device is
> allowed but not required to set the Relaxed Ordering bit in
> transactions it initiates (PCIe r5.0, sec 7.5.3.4; PCI-X 2.0, sec
> 7.2.3).
>
> I suspect there may be device-specific controls, too, because [1]
> claims to enable/disable Relaxed Ordering but doesn't touch the
> PCIe Device Control register. Device-specific controls are
> certainly allowed, but of course it would be up to the driver, and
> the device cannot generate TLPs with Relaxed Ordering unless the
> architected PCIe Enable Relaxed Ordering bit is *also* set.
>
> - Platform firmware can enable Relaxed Ordering for a device either
> before handoff to the OS or via the _HPX ACPI method.
>
> - The PCI core never enables Relaxed Ordering itself except when
> applying _HPX.
>
> - At enumeration-time, the PCI core disables Relaxed Ordering in
> pci_configure_relaxed_ordering() if the device is below a Root
> Port that has a quirk indicating an erratum. This quirk currently
> includes many Intel Root Ports, but not all, and is an ongoing
> maintenance problem.
>
> - The PCI core provides pcie_relaxed_ordering_enabled() which tells
> you whether Relaxed Ordering is enabled. Only used by cxgb4 and
> csio, which use that information to fill in Ingress Queue
> Commands.
>
> - The PCI core does not provide a driver interface to enable or
> disable Relaxed Ordering.
>
> - Some drivers disable Relaxed Ordering themselves: mtip32xx,
> netup_unidvb, tg3, myri10ge (oddly, only if CONFIG_MYRI10GE_DCA),
> tsi721, kp2000_pcie.
>
> - Some drivers enable Relaxed Ordering themselves: niu, tegra.
>
> What are we missing and what should the PCI core do?
>
> - Currently the Enable Relaxed Ordering bit depends on what firmware
> did. Maybe the PCI core should always clear it during
> enumeration?
>
> - The PCI core should probably have a driver interface like
> pci_set_relaxed_ordering(dev, enable) that can set or clear the
> architected PCI-X or PCIe Enable Relaxed Ordering bit.
>
> - Maybe there should be a kernel command-line parameter like
> "pci=norelax" that disables Relaxed Ordering for every device and
> prevents pci_set_relaxed_ordering() from enabling it.
>
> I'm mixed on this because these tend to become folklore about how
> to "fix" problems and we end up with systems that don't work
> unless you happen to find the option on the web. For debugging
> issues, it might be enough to disable Relaxed Ordering using
> setpci, e.g., "setpci -s02:00.0 CAP_EXP+8.w=0"
>
> [1] https://lore.kernel.org/netdev/20200623195229.26411-11-saeedm@mellanox.com/
>
Hi Bjorn,
Thanks for the detailed reply. From initial testing I can say that
turning off the relaxed ordering on the PCI (setpci -s02:00.0
CAP_EXP+8.w=0) is the chicken bit I was looking for.
This lower the risk of depending on pcie_relaxed_ordering_enabled(). I
will update my patch and resubmit.
Thanks,
Aya
Powered by blists - more mailing lists