lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 9 May 2024 13:41:16 +0200
From: Marek Szyprowski <m.szyprowski@...sung.com>
To: Alexander Lobakin <aleksander.lobakin@...el.com>, Christoph Hellwig
	<hch@....de>
Cc: Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
	Robin Murphy <robin.murphy@....com>, Joerg Roedel <joro@...tes.org>, Will
	Deacon <will@...nel.org>, "Rafael J. Wysocki" <rafael@...nel.org>, Magnus
	Karlsson <magnus.karlsson@...el.com>,
	nex.sw.ncis.osdt.itp.upstreaming@...el.com, bpf@...r.kernel.org,
	netdev@...r.kernel.org, iommu@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 2/7] dma: avoid redundant calls for sync operations

Dear All,

On 07.05.2024 13:20, Alexander Lobakin wrote:
> Quite often, devices do not need dma_sync operations on x86_64 at least.
> Indeed, when dev_is_dma_coherent(dev) is true and
> dev_use_swiotlb(dev) is false, iommu_dma_sync_single_for_cpu()
> and friends do nothing.
>
> However, indirectly calling them when CONFIG_RETPOLINE=y consumes about
> 10% of cycles on a cpu receiving packets from softirq at ~100Gbit rate.
> Even if/when CONFIG_RETPOLINE is not set, there is a cost of about 3%.
>
> Add dev->need_dma_sync boolean and turn it off during the device
> initialization (dma_set_mask()) depending on the setup:
> dev_is_dma_coherent() for the direct DMA, !(sync_single_for_device ||
> sync_single_for_cpu) or the new dma_map_ops flag, %DMA_F_CAN_SKIP_SYNC,
> advertised for non-NULL DMA ops.
> Then later, if/when swiotlb is used for the first time, the flag
> is reset back to on, from swiotlb_tbl_map_single().
>
> On iavf, the UDP trafficgen with XDP_DROP in skb mode test shows
> +3-5% increase for direct DMA.
>
> Suggested-by: Christoph Hellwig <hch@....de> # direct DMA shortcut
> Co-developed-by: Eric Dumazet <edumazet@...gle.com>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@...el.com>
> ---
>   include/linux/device.h      |  4 +++
>   include/linux/dma-map-ops.h | 12 ++++++++
>   include/linux/dma-mapping.h | 53 +++++++++++++++++++++++++++++++----
>   kernel/dma/mapping.c        | 55 +++++++++++++++++++++++++++++--------
>   kernel/dma/swiotlb.c        |  6 ++++
>   5 files changed, 113 insertions(+), 17 deletions(-)


This patch landed in today's linux-next as commit f406c8e4b770 ("dma: 
avoid redundant calls for sync operations"). Unfortunately I found that 
it breaks some of the ARM 32bit boards by forcing skipping DMA sync 
operations on non-coherent systems. This happens because this patch 
hooks dma_need_sync=true initialization into set_dma_mask(), but 
set_dma_mask() is not called from all device drivers, especially from 
those which operates properly with the default 32bit dma mask (like most 
of the platform devices created by the OF layer).

Frankly speaking I have no idea how this should be fixed. I expect that 
there are lots of broken devices after this change, because I don't 
remember that calling set_dma_mask() is mandatory for device drivers.

After adding dma_set_mask(dev, DMA_BIT_MASK(32)) to the drivers relevant 
for my boards the issues are gone, but I'm not sure this is the right 
approach...


> ...

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ