[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <46160534-5003-4809-a408-6b3a3f4921e9@samsung.com>
Date: Thu, 9 May 2024 13:41:16 +0200
From: Marek Szyprowski <m.szyprowski@...sung.com>
To: Alexander Lobakin <aleksander.lobakin@...el.com>, Christoph Hellwig
<hch@....de>
Cc: Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Robin Murphy <robin.murphy@....com>, Joerg Roedel <joro@...tes.org>, Will
Deacon <will@...nel.org>, "Rafael J. Wysocki" <rafael@...nel.org>, Magnus
Karlsson <magnus.karlsson@...el.com>,
nex.sw.ncis.osdt.itp.upstreaming@...el.com, bpf@...r.kernel.org,
netdev@...r.kernel.org, iommu@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 2/7] dma: avoid redundant calls for sync operations
Dear All,
On 07.05.2024 13:20, Alexander Lobakin wrote:
> Quite often, devices do not need dma_sync operations on x86_64 at least.
> Indeed, when dev_is_dma_coherent(dev) is true and
> dev_use_swiotlb(dev) is false, iommu_dma_sync_single_for_cpu()
> and friends do nothing.
>
> However, indirectly calling them when CONFIG_RETPOLINE=y consumes about
> 10% of cycles on a cpu receiving packets from softirq at ~100Gbit rate.
> Even if/when CONFIG_RETPOLINE is not set, there is a cost of about 3%.
>
> Add dev->need_dma_sync boolean and turn it off during the device
> initialization (dma_set_mask()) depending on the setup:
> dev_is_dma_coherent() for the direct DMA, !(sync_single_for_device ||
> sync_single_for_cpu) or the new dma_map_ops flag, %DMA_F_CAN_SKIP_SYNC,
> advertised for non-NULL DMA ops.
> Then later, if/when swiotlb is used for the first time, the flag
> is reset back to on, from swiotlb_tbl_map_single().
>
> On iavf, the UDP trafficgen with XDP_DROP in skb mode test shows
> +3-5% increase for direct DMA.
>
> Suggested-by: Christoph Hellwig <hch@....de> # direct DMA shortcut
> Co-developed-by: Eric Dumazet <edumazet@...gle.com>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@...el.com>
> ---
> include/linux/device.h | 4 +++
> include/linux/dma-map-ops.h | 12 ++++++++
> include/linux/dma-mapping.h | 53 +++++++++++++++++++++++++++++++----
> kernel/dma/mapping.c | 55 +++++++++++++++++++++++++++++--------
> kernel/dma/swiotlb.c | 6 ++++
> 5 files changed, 113 insertions(+), 17 deletions(-)
This patch landed in today's linux-next as commit f406c8e4b770 ("dma:
avoid redundant calls for sync operations"). Unfortunately I found that
it breaks some of the ARM 32bit boards by forcing skipping DMA sync
operations on non-coherent systems. This happens because this patch
hooks dma_need_sync=true initialization into set_dma_mask(), but
set_dma_mask() is not called from all device drivers, especially from
those which operates properly with the default 32bit dma mask (like most
of the platform devices created by the OF layer).
Frankly speaking I have no idea how this should be fixed. I expect that
there are lots of broken devices after this change, because I don't
remember that calling set_dma_mask() is mandatory for device drivers.
After adding dma_set_mask(dev, DMA_BIT_MASK(32)) to the drivers relevant
for my boards the issues are gone, but I'm not sure this is the right
approach...
> ...
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Powered by blists - more mailing lists