[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4141ed72-c359-ca49-d4a5-57a810888083@huawei.com>
Date: Tue, 15 Sep 2020 09:41:12 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Jakub Kicinski <kuba@...nel.org>,
Huazhong Tan <tanhuazhong@...wei.com>
CC: <davem@...emloft.net>, <netdev@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <salil.mehta@...wei.com>,
<yisen.zhuang@...wei.com>, <linuxarm@...wei.com>
Subject: Re: [PATCH net-next 5/6] net: hns3: use writel() to optimize the
barrier operation
On 2020/9/15 5:45, Jakub Kicinski wrote:
> On Mon, 14 Sep 2020 20:06:56 +0800 Huazhong Tan wrote:
>> From: Yunsheng Lin <linyunsheng@...wei.com>
>>
>> writel() can be used to order I/O vs memory by default when
>> writing portable drivers. Use writel() to replace wmb() +
>> writel_relaxed(), and writel() is dma_wmb() + writel_relaxed()
>> for ARM64, so there is an optimization here because dma_wmb()
>> is a lighter barrier than wmb().
>
> Cool, although lots of drivers will need a change like this now.
>
> And looks like memory-barriers.txt is slightly, eh, not coherent there,
> between the documentation of writeX() and dma_wmb() :S
>
> 3. A writeX() by a CPU thread to the peripheral will first wait for the
> completion of all prior writes to memory either issued by, or
"wait for the completion of all prior writes to memory" seems to match the semantics
of writel() here?
> propagated to, the same thread. This ensures that writes by the CPU
> to an outbound DMA buffer allocated by dma_alloc_coherent() will be
"outbound DMA buffer" mapped by the streaming API can also be ordered by the
writel(), Is that what you meant by "not coherent"?
> visible to a DMA engine when the CPU writes to its MMIO control
> register to trigger the transfer.
>
>
>
> (*) dma_wmb();
> (*) dma_rmb();
>
> These are for use with consistent memory to guarantee the ordering
> of writes or reads of shared memory accessible to both the CPU and a
> DMA capable device.
>
> For example, consider a device driver that shares memory with a device
> and uses a descriptor status value to indicate if the descriptor belongs
> to the device or the CPU, and a doorbell to notify it when new
> descriptors are available:
>
> if (desc->status != DEVICE_OWN) {
> /* do not read data until we own descriptor */
> dma_rmb();
>
> /* read/modify data */
> read_data = desc->data;
> desc->data = write_data;
>
> /* flush modifications before status update */
> dma_wmb();
>
> /* assign ownership */
> desc->status = DEVICE_OWN;
>
> /* notify device of new descriptors */
> writel(DESC_NOTIFY, doorbell);
> }
>
> The dma_rmb() allows us guarantee the device has released ownership
> before we read the data from the descriptor, and the dma_wmb() allows
> us to guarantee the data is written to the descriptor before the device
> can see it now has ownership. Note that, when using writel(), a prior
> wmb() is not needed to guarantee that the cache coherent memory writes
> have completed before writing to the MMIO region. The cheaper
> writel_relaxed() does not provide this guarantee and must not be used
> here.
I am not sure writel() has any implication here. My interpretation to the above
doc is that dma_wmb() is more appropriate when only coherent/consistent memory
need to be ordered.
If writel() is used, then dma_wmb() or wmb() is unnecessary, see:
commit: 5846581e3563 ("locking/memory-barriers.txt: Fix broken DMA vs. MMIO ordering example")
> .
>
Powered by blists - more mailing lists