lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 15 Sep 2020 09:41:12 +0800
From:   Yunsheng Lin <>
To:     Jakub Kicinski <>,
        Huazhong Tan <>
CC:     <>, <>,
        <>, <>,
        <>, <>
Subject: Re: [PATCH net-next 5/6] net: hns3: use writel() to optimize the
 barrier operation

On 2020/9/15 5:45, Jakub Kicinski wrote:
> On Mon, 14 Sep 2020 20:06:56 +0800 Huazhong Tan wrote:
>> From: Yunsheng Lin <>
>> writel() can be used to order I/O vs memory by default when
>> writing portable drivers. Use writel() to replace wmb() +
>> writel_relaxed(), and writel() is dma_wmb() + writel_relaxed()
>> for ARM64, so there is an optimization here because dma_wmb()
>> is a lighter barrier than wmb().
> Cool, although lots of drivers will need a change like this now. 
> And looks like memory-barriers.txt is slightly, eh, not coherent there,
> between the documentation of writeX() and dma_wmb() :S
> 	3. A writeX() by a CPU thread to the peripheral will first wait for the
> 	   completion of all prior writes to memory either issued by, or

"wait for the completion of all prior writes to memory" seems to match the semantics
of writel() here?

> 	   propagated to, the same thread. This ensures that writes by the CPU
> 	   to an outbound DMA buffer allocated by dma_alloc_coherent() will be

"outbound DMA buffer" mapped by the streaming API can also be ordered by the
writel(), Is that what you meant by "not coherent"?

> 	   visible to a DMA engine when the CPU writes to its MMIO control
> 	   register to trigger the transfer.
>  (*) dma_wmb();
>  (*) dma_rmb();
>      These are for use with consistent memory to guarantee the ordering
>      of writes or reads of shared memory accessible to both the CPU and a
>      DMA capable device.
>      For example, consider a device driver that shares memory with a device
>      and uses a descriptor status value to indicate if the descriptor belongs
>      to the device or the CPU, and a doorbell to notify it when new
>      descriptors are available:
> 	if (desc->status != DEVICE_OWN) {
> 		/* do not read data until we own descriptor */
> 		dma_rmb();
> 		/* read/modify data */
> 		read_data = desc->data;
> 		desc->data = write_data;
> 		/* flush modifications before status update */
> 		dma_wmb();
> 		/* assign ownership */
> 		desc->status = DEVICE_OWN;
> 		/* notify device of new descriptors */
> 		writel(DESC_NOTIFY, doorbell);
> 	}
>      The dma_rmb() allows us guarantee the device has released ownership
>      before we read the data from the descriptor, and the dma_wmb() allows
>      us to guarantee the data is written to the descriptor before the device
>      can see it now has ownership.  Note that, when using writel(), a prior
>      wmb() is not needed to guarantee that the cache coherent memory writes
>      have completed before writing to the MMIO region.  The cheaper
>      writel_relaxed() does not provide this guarantee and must not be used
>      here.

I am not sure writel() has any implication here. My interpretation to the above
doc is that dma_wmb() is more appropriate when only coherent/consistent memory
need to be ordered.

If writel() is used, then dma_wmb() or wmb() is unnecessary, see:

commit: 5846581e3563 ("locking/memory-barriers.txt: Fix broken DMA vs. MMIO ordering example")

> .

Powered by blists - more mailing lists