[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2d13134d-1e5c-4534-8686-c0022caeb36c@arm.com>
Date: Wed, 14 Feb 2024 17:58:30 +0000
From: Robin Murphy <robin.murphy@....com>
To: Alexander Lobakin <aleksander.lobakin@...el.com>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>
Cc: Christoph Hellwig <hch@....de>,
Marek Szyprowski <m.szyprowski@...sung.com>, Joerg Roedel <joro@...tes.org>,
Will Deacon <will@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Magnus Karlsson <magnus.karlsson@...el.com>,
Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
Alexander Duyck <alexanderduyck@...com>, bpf@...r.kernel.org,
netdev@...r.kernel.org, iommu@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v3 3/7] iommu/dma: avoid expensive indirect calls
for sync operations
On 2024-02-14 4:21 pm, Alexander Lobakin wrote:
> When IOMMU is on, the actual synchronization happens in the same cases
> as with the direct DMA. Advertise %DMA_F_CAN_SKIP_SYNC in IOMMU DMA to
> skip sync ops calls (indirect) for non-SWIOTLB buffers.
>
> perf profile before the patch:
>
> 18.53% [kernel] [k] gq_rx_skb
> 14.77% [kernel] [k] napi_reuse_skb
> 8.95% [kernel] [k] skb_release_data
> 5.42% [kernel] [k] dev_gro_receive
> 5.37% [kernel] [k] memcpy
> <*> 5.26% [kernel] [k] iommu_dma_sync_sg_for_cpu
> 4.78% [kernel] [k] tcp_gro_receive
> <*> 4.42% [kernel] [k] iommu_dma_sync_sg_for_device
> 4.12% [kernel] [k] ipv6_gro_receive
> 3.65% [kernel] [k] gq_pool_get
> 3.25% [kernel] [k] skb_gro_receive
> 2.07% [kernel] [k] napi_gro_frags
> 1.98% [kernel] [k] tcp6_gro_receive
> 1.27% [kernel] [k] gq_rx_prep_buffers
> 1.18% [kernel] [k] gq_rx_napi_handler
> 0.99% [kernel] [k] csum_partial
> 0.74% [kernel] [k] csum_ipv6_magic
> 0.72% [kernel] [k] free_pcp_prepare
> 0.60% [kernel] [k] __napi_poll
> 0.58% [kernel] [k] net_rx_action
> 0.56% [kernel] [k] read_tsc
> <*> 0.50% [kernel] [k] __x86_indirect_thunk_r11
> 0.45% [kernel] [k] memset
>
> After patch, lines with <*> no longer show up, and overall
> cpu usage looks much better (~60% instead of ~72%):
>
> 25.56% [kernel] [k] gq_rx_skb
> 9.90% [kernel] [k] napi_reuse_skb
> 7.39% [kernel] [k] dev_gro_receive
> 6.78% [kernel] [k] memcpy
> 6.53% [kernel] [k] skb_release_data
> 6.39% [kernel] [k] tcp_gro_receive
> 5.71% [kernel] [k] ipv6_gro_receive
> 4.35% [kernel] [k] napi_gro_frags
> 4.34% [kernel] [k] skb_gro_receive
> 3.50% [kernel] [k] gq_pool_get
> 3.08% [kernel] [k] gq_rx_napi_handler
> 2.35% [kernel] [k] tcp6_gro_receive
> 2.06% [kernel] [k] gq_rx_prep_buffers
> 1.32% [kernel] [k] csum_partial
> 0.93% [kernel] [k] csum_ipv6_magic
> 0.65% [kernel] [k] net_rx_action
>
> iavf yields +10% of Mpps on Rx. This also unblocks batched allocations
> of XSk buffers when IOMMU is active.
Acked-by: Robin Murphy <robin.murphy@....com>
> Co-developed-by: Eric Dumazet <edumazet@...gle.com>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@...el.com>
> ---
> drivers/iommu/dma-iommu.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 50ccc4f1ef81..4ab9ac13d362 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1707,7 +1707,8 @@ static size_t iommu_dma_opt_mapping_size(void)
> }
>
> static const struct dma_map_ops iommu_dma_ops = {
> - .flags = DMA_F_PCI_P2PDMA_SUPPORTED,
> + .flags = DMA_F_PCI_P2PDMA_SUPPORTED |
> + DMA_F_CAN_SKIP_SYNC,
> .alloc = iommu_dma_alloc,
> .free = iommu_dma_free,
> .alloc_pages = dma_common_alloc_pages,
Powered by blists - more mailing lists