lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 9 Jun 2021 15:48:27 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Jason Gunthorpe' <jgg@...dia.com>
CC:     'Chuck Lever III' <chuck.lever@...cle.com>,
        Christoph Hellwig <hch@....de>,
        Leon Romanovsky <leon@...nel.org>,
        Doug Ledford <dledford@...hat.com>,
        Avihai Horon <avihaih@...dia.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
        Bart Van Assche <bvanassche@....org>,
        Tom Talpey <tom@...pey.com>,
        Santosh Shilimkar <santosh.shilimkar@...cle.com>,
        Keith Busch <kbusch@...nel.org>,
        Honggang LI <honli@...hat.com>,
        Max Gurtovoy <mgurtovoy@...dia.com>
Subject: RE: [PATCH v2 rdma-next] RDMA/mlx5: Enable Relaxed Ordering by
 default for kernel ULPs

From: Jason Gunthorpe
> Sent: 09 June 2021 16:09
> 
> On Wed, Jun 09, 2021 at 03:05:52PM +0000, David Laight wrote:
> 
> > In principle some writel() could generate PCIe write TLP (going
> > to the target) that have the 'relaxed ordering' bit set.
> 
> In Linux we call this writel_relaxed(), though I know of no
> implementation that sets the RO bit in the TLP based on this, it would
> be semantically correct to do so.
> 
> writel() has strong order requirements and must not generate a RO TLP.

Somewhere I'd forgotten about that :-(
It usually just allows the compiler and cpu hardware re-sequence
the bus cycles.

OTOH I doubt any/many PCIe targets have 'memory' areas that would
benefit from RO write TLP.
Especially since everything is organised to use target issued buffer
copies.

I'm guessing that the benefits from RO are when the writes hit memory
that is on a NUMA node or 'differently cached'.
So writes to once cache line can proceed while earlier writes are
still waiting for the cache-coherency protocol.

>From what I've seen writel() aren't too bad - they are async.
The real problem is readl().
The x86 cpu I have use a separate TLP id (I've forgotten the correct
term) for each cpu core.
So while multiple cpu can (and do) issue concurrent reads, reads from
a single cpu happen one TLP at a time - even though it would be legitimate
for the out-of-order execution unit to issue additional read TLP.
There are times when you really do have to do PIO buffer reads :-(

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ