lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6446d34f9568_338f220872@john.notmuch>
Date:   Mon, 24 Apr 2023 12:06:55 -0700
From:   John Fastabend <john.fastabend@...il.com>
To:     Kal Conley <kal.conley@...tris.com>,
        Björn Töpel <bjorn@...nel.org>,
        Magnus Karlsson <magnus.karlsson@...el.com>,
        Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        John Fastabend <john.fastabend@...il.com>
Cc:     Kal Conley <kal.conley@...tris.com>, netdev@...r.kernel.org,
        bpf@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: RE: [PATCH] xsk: Use pool->dma_pages to check for DMA

Kal Conley wrote:
> Compare pool->dma_pages instead of pool->dma_pages_cnt to check for an
> active DMA mapping. pool->dma_pages needs to be read anyway to access
> the map so this compiles to more efficient code.

Was it noticable in some sort of performance test?

> 
> Signed-off-by: Kal Conley <kal.conley@...tris.com>
> Acked-by: Magnus Karlsson <magnus.karlsson@...el.com>
> ---
>  include/net/xsk_buff_pool.h | 2 +-
>  net/xdp/xsk_buff_pool.c     | 7 ++++---
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
> index d318c769b445..a8d7b8a3688a 100644
> --- a/include/net/xsk_buff_pool.h
> +++ b/include/net/xsk_buff_pool.h
> @@ -180,7 +180,7 @@ static inline bool xp_desc_crosses_non_contig_pg(struct xsk_buff_pool *pool,
>  	if (likely(!cross_pg))
>  		return false;
>  
> -	return pool->dma_pages_cnt &&
> +	return pool->dma_pages &&
>  	       !(pool->dma_pages[addr >> PAGE_SHIFT] & XSK_NEXT_PG_CONTIG_MASK);
>  }
>  
> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index b2df1e0f8153..26f6d304451e 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -350,7 +350,7 @@ void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs)
>  {
>  	struct xsk_dma_map *dma_map;
>  
> -	if (pool->dma_pages_cnt == 0)
> +	if (!pool->dma_pages)
>  		return;

This seems to be used in the setup/tear-down paths so your optimizing
a control side. Is there a fast path with this code? I walked the
ice driver. If its just setup code we should do whatever is more
readable.

>  
>  	dma_map = xp_find_dma_map(pool);
> @@ -364,6 +364,7 @@ void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs)
>  
>  	__xp_dma_unmap(dma_map, attrs);
>  	kvfree(pool->dma_pages);
> +	pool->dma_pages = NULL;
>  	pool->dma_pages_cnt = 0;
>  	pool->dev = NULL;
>  }
> @@ -503,7 +504,7 @@ static struct xdp_buff_xsk *__xp_alloc(struct xsk_buff_pool *pool)
>  	if (pool->unaligned) {
>  		xskb = pool->free_heads[--pool->free_heads_cnt];
>  		xp_init_xskb_addr(xskb, pool, addr);
> -		if (pool->dma_pages_cnt)
> +		if (pool->dma_pages)
>  			xp_init_xskb_dma(xskb, pool, pool->dma_pages, addr);
>  	} else {
>  		xskb = &pool->heads[xp_aligned_extract_idx(pool, addr)];
> @@ -569,7 +570,7 @@ static u32 xp_alloc_new_from_fq(struct xsk_buff_pool *pool, struct xdp_buff **xd
>  		if (pool->unaligned) {
>  			xskb = pool->free_heads[--pool->free_heads_cnt];
>  			xp_init_xskb_addr(xskb, pool, addr);
> -			if (pool->dma_pages_cnt)
> +			if (pool->dma_pages)
>  				xp_init_xskb_dma(xskb, pool, pool->dma_pages, addr);

Both the _alloc_ cases read neighboring free_heads_cnt so your saving a load I guess?
This is so deep into micro-optimizing I'm curious if you could measure it?

>  		} else {
>  			xskb = &pool->heads[xp_aligned_extract_idx(pool, addr)];

I'm not actually against optimizing but maybe another idea. Why do we have to
check at all? Seems if the DMA has been disabled/unmapped the driver shouldn't
be trying to call xsk_buff_alloc_batch? Then you can just drop the 'if' check.

It feels to me the drivers shouldn't even be calling this after unmapping
the dma. WDYT?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ