lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 27 Mar 2018 23:24:50 -0400
From:   Sinan Kaya <okaya@...eaurora.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>
Cc:     Alexander Duyck <alexander.duyck@...il.com>,
        Will Deacon <will.deacon@....com>,
        Arnd Bergmann <arnd@...db.de>, Jason Gunthorpe <jgg@...pe.ca>,
        David Laight <David.Laight@...lab.com>,
        Oliver <oohall@...il.com>,
        "open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)" 
        <linuxppc-dev@...ts.ozlabs.org>,
        "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
        Alexander Duyck <alexander.h.duyck@...hat.com>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: RFC on writel and writel_relaxed

On 3/27/2018 10:51 PM, Linus Torvalds wrote:
>> The discussion at hand is about
>>
>>         dma_buffer->foo = 1;                    /* WB */
>>         writel(KICK, DMA_KICK_REGISTER);        /* UC */
> Yes. That certainly is ordered on x86. In fact, afaik it's ordered
> even if that writel() might be of type WC, because that only delays
> writes, it doesn't move them earlier.

Now that we clarified x86 myth, Is this guaranteed on all architectures?
We keep getting IA64 exception example. Maybe, this is also corrected since
then.

Jose Abreu says "I don't know about x86 but arc architecture doesn't
have a wmb() in the writel() function (in some configs)".

As long as we have these exceptions, these wmb() in common drivers is not
going anywhere and relaxed-arches will continue paying performance penalty.

I see 15% performance loss on ARM64 servers using Intel i40e network
drivers and an XL710 adapter due to CPU keeping itself busy doing barriers
most of the time rather than real work because of sequences like this all over
the place.

         dma_buffer->foo = 1;                    /* WB */
	 wmb()
         writel(KICK, DMA_KICK_REGISTER);        /* UC */

I posted several patches last week to remove duplicate barriers on ARM while
trying to make the code friendly with other architectures.

Basically changing it to

dma_buffer->foo = 1;                    /* WB */
wmb()
writel_relaxed(KICK, DMA_KICK_REGISTER);        /* UC */
mmiowb()

This is a small step in the performance direction until we remove all exceptions.

https://www.spinics.net/lists/netdev/msg491842.html
https://www.spinics.net/lists/linux-rdma/msg62434.html
https://www.spinics.net/lists/arm-kernel/msg642336.html

Discussion started to move around the need for relaxed API on PPC and then
why wmb() question came up.

Sinan

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ