[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1522200077.7364.85.camel@kernel.crashing.org>
Date: Wed, 28 Mar 2018 12:21:17 +1100
From: Benjamin Herrenschmidt <benh@...nel.crashing.org>
To: Will Deacon <will.deacon@....com>,
Sinan Kaya <okaya@...eaurora.org>
Cc: Arnd Bergmann <arnd@...db.de>, Jason Gunthorpe <jgg@...pe.ca>,
David Laight <David.Laight@...lab.com>,
Oliver <oohall@...il.com>,
"open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)"
<linuxppc-dev@...ts.ozlabs.org>,
"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
Alexander Duyck <alexander.h.duyck@...hat.com>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Alexander Duyck <alexander.duyck@...il.com>,
torvalds@...ux-foundation.org
Subject: Re: RFC on writel and writel_relaxed
On Tue, 2018-03-27 at 16:10 +0100, Will Deacon wrote:
> To clarify: are you saying that on x86 you need a wmb() prior to a writel
> if you want that writel to be ordered after prior writes to memory? Is this
> specific to WC memory or some other non-standard attribute?
>
> The only reason we have wmb() inside writel() on arm, arm64 and power is for
> parity with x86 because Linus (CC'd) wanted architectures to order I/O vs
> memory by default so that it was easier to write portable drivers. The
> performance impact of that implicit barrier is non-trivial, but we want the
> driver portability and I went as far as adding generic _relaxed versions for
> the cases where ordering isn't required. You seem to be suggesting that none
> of this is necessary and drivers would already run into problems on x86 if
> they didn't use wmb() explicitly in conjunction with writel, which I find
> hard to believe and is in direct contradiction with the current Linux I/O
> memory model (modulo the broken example in the dma_*mb section of
> memory-barriers.txt).
Another clarification while we are at it ....
All of this only applies to concurrent access by the CPU and the device
to memory allocate with dma_alloc_coherent().
For memory "mapped" into the DMA domain via dma_map_* then an extra
dma_sync_for_* is needed.
In most useful server cases etc... these latter are NOPs, but
architecture without full DMA cache coherency or using swiotlb,
dma_map_* might maintain bounce buffers or play additional cache
flushing tricks.
Cheers,
Ben.
Powered by blists - more mailing lists