[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240126135456.704351-4-aleksander.lobakin@intel.com>
Date: Fri, 26 Jan 2024 14:54:52 +0100
From: Alexander Lobakin <aleksander.lobakin@...el.com>
To: "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>
Cc: Alexander Lobakin <aleksander.lobakin@...el.com>,
Christoph Hellwig <hch@....de>,
Marek Szyprowski <m.szyprowski@...sung.com>,
Robin Murphy <robin.murphy@....com>,
Joerg Roedel <joro@...tes.org>,
Will Deacon <will@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Magnus Karlsson <magnus.karlsson@...el.com>,
Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
Alexander Duyck <alexanderduyck@...com>,
bpf@...r.kernel.org,
netdev@...r.kernel.org,
iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org
Subject: [PATCH net-next 3/7] iommu/dma: avoid expensive indirect calls for sync operations
From: Eric Dumazet <edumazet@...gle.com>
Use the new dma_map_ops::can_skip_sync() callback in IOMMU DMA. It is
enough to check only for the DMA coherence, as SWIOTLB is checked
dynamically later.
perf profile before the patch:
18.53% [kernel] [k] gq_rx_skb
14.77% [kernel] [k] napi_reuse_skb
8.95% [kernel] [k] skb_release_data
5.42% [kernel] [k] dev_gro_receive
5.37% [kernel] [k] memcpy
<*> 5.26% [kernel] [k] iommu_dma_sync_sg_for_cpu
4.78% [kernel] [k] tcp_gro_receive
<*> 4.42% [kernel] [k] iommu_dma_sync_sg_for_device
4.12% [kernel] [k] ipv6_gro_receive
3.65% [kernel] [k] gq_pool_get
3.25% [kernel] [k] skb_gro_receive
2.07% [kernel] [k] napi_gro_frags
1.98% [kernel] [k] tcp6_gro_receive
1.27% [kernel] [k] gq_rx_prep_buffers
1.18% [kernel] [k] gq_rx_napi_handler
0.99% [kernel] [k] csum_partial
0.74% [kernel] [k] csum_ipv6_magic
0.72% [kernel] [k] free_pcp_prepare
0.60% [kernel] [k] __napi_poll
0.58% [kernel] [k] net_rx_action
0.56% [kernel] [k] read_tsc
<*> 0.50% [kernel] [k] __x86_indirect_thunk_r11
0.45% [kernel] [k] memset
After patch, lines with <*> no longer show up, and overall
cpu usage looks much better (~60% instead of ~72%)
25.56% [kernel] [k] gq_rx_skb
9.90% [kernel] [k] napi_reuse_skb
7.39% [kernel] [k] dev_gro_receive
6.78% [kernel] [k] memcpy
6.53% [kernel] [k] skb_release_data
6.39% [kernel] [k] tcp_gro_receive
5.71% [kernel] [k] ipv6_gro_receive
4.35% [kernel] [k] napi_gro_frags
4.34% [kernel] [k] skb_gro_receive
3.50% [kernel] [k] gq_pool_get
3.08% [kernel] [k] gq_rx_napi_handler
2.35% [kernel] [k] tcp6_gro_receive
2.06% [kernel] [k] gq_rx_prep_buffers
1.32% [kernel] [k] csum_partial
0.93% [kernel] [k] csum_ipv6_magic
0.65% [kernel] [k] net_rx_action
iavf yields +10% of Mpps on Rx. This also unblocks batch allocations
of XSk buffers when IOMMU is active.
Signed-off-by: Eric Dumazet <edumazet@...gle.com>
Co-developed-by: Alexander Lobakin <aleksander.lobakin@...el.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@...el.com>
---
drivers/iommu/dma-iommu.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 50ccc4f1ef81..86290562eda5 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1720,6 +1720,7 @@ static const struct dma_map_ops iommu_dma_ops = {
.unmap_page = iommu_dma_unmap_page,
.map_sg = iommu_dma_map_sg,
.unmap_sg = iommu_dma_unmap_sg,
+ .can_skip_sync = dev_is_dma_coherent,
.sync_single_for_cpu = iommu_dma_sync_single_for_cpu,
.sync_single_for_device = iommu_dma_sync_single_for_device,
.sync_sg_for_cpu = iommu_dma_sync_sg_for_cpu,
--
2.43.0
Powered by blists - more mailing lists