[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <082c6bfe-5146-c213-9220-65177717c342@mellanox.com>
Date: Wed, 24 Jun 2020 10:34:40 +0300
From: Aya Levin <ayal@...lanox.com>
To: Saeed Mahameed <saeedm@...lanox.com>,
"kuba@...nel.org" <kuba@...nel.org>,
Bjorn Helgaas <helgaas@...nel.org>
Cc: "mkubecek@...e.cz" <mkubecek@...e.cz>,
"davem@...emloft.net" <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Tariq Toukan <tariqt@...lanox.com>
Subject: Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering
On 6/24/2020 9:56 AM, Saeed Mahameed wrote:
> On Tue, 2020-06-23 at 14:31 -0700, Jakub Kicinski wrote:
>> On Tue, 23 Jun 2020 12:52:29 -0700 Saeed Mahameed wrote:
>>> From: Aya Levin <ayal@...lanox.com>
>>>
>>> The concept of Relaxed Ordering in the PCI Express environment
>>> allows
>>> switches in the path between the Requester and Completer to reorder
>>> some
>>> transactions just received before others that were previously
>>> enqueued.
>>>
>>> In ETH driver, there is no question of write integrity since each
>>> memory
>>> segment is written only once per cycle. In addition, the driver
>>> doesn't
>>> access the memory shared with the hardware until the corresponding
>>> CQE
>>> arrives indicating all PCI transactions are done.
>>
>
> Hi Jakub, sorry i missed your comments on this patch.
>
>> Assuming the device sets the RO bits appropriately, right? Otherwise
>> CQE write could theoretically surpass the data write, no?
>>
>
> Yes HW guarantees correctness of correlated queues and transactions.
>
>>> With relaxed ordering set, traffic on the remote-numa is at the
>>> same
>>> level as when on the local numa.
>>
>> Same level of? Achievable bandwidth?
>>
>
> Yes, Bandwidth, according the below explanation, i see that the message
> needs improvements.
>
>>> Running TCP single stream over ConnectX-4 LX, ARM CPU on remote-
>>> numa
>>> has 300% improvement in the bandwidth.
>>> With relaxed ordering turned off: BW:10 [GB/s]
>>> With relaxed ordering turned on: BW:40 [GB/s]
>>>
>>> The driver turns relaxed ordering off by default. It exposes 2
>>> boolean
>>> private-flags in ethtool: pci_ro_read and pci_ro_write for user
>>> control.
>>>
>>> $ ethtool --show-priv-flags eth2
>>> Private flags for eth2:
>>> ...
>>> pci_ro_read : off
>>> pci_ro_write : off
>>>
>>> $ ethtool --set-priv-flags eth2 pci_ro_write on
>>> $ ethtool --set-priv-flags eth2 pci_ro_read on
>>
>> I think Michal will rightly complain that this does not belong in
>> private flags any more. As (/if?) ARM deployments take a foothold
>> in DC this will become a common setting for most NICs.
>
> Initially we used pcie_relaxed_ordering_enabled() to
> programmatically enable this on/off on boot but this seems to
> introduce some degradation on some Intel CPUs since the Intel Faulty
> CPUs list is not up to date. Aya is discussing this with Bjorn.
Adding Bjorn Helgaas
>
> So until we figure this out, will keep this off by default.
>
> for the private flags we want to keep them for performance analysis as
> we do with all other mlx5 special performance features and flags.
>
Powered by blists - more mailing lists