lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <dda98f40-6964-464c-b468-61fed67e0e96@intel.com>
Date: Thu, 9 Jan 2025 16:44:56 +0100
From: Alexander Lobakin <aleksander.lobakin@...el.com>
To: Furong Xu <0x1207@...il.com>
CC: <netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>, "Jesper Dangaard
 Brouer" <hawk@...nel.org>, Ilias Apalodimas <ilias.apalodimas@...aro.org>,
	"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, "Simon
 Horman" <horms@...nel.org>
Subject: Re: [PATCH net-next v3] page_pool: check for dma_sync_size earlier

From: Furong Xu <0x1207@...il.com>
Date: Mon,  6 Jan 2025 11:02:25 +0800

> Setting dma_sync_size to 0 is not illegal, fec_main.c and ravb_main.c
> already did.
> We can save a couple of function calls if check for dma_sync_size earlier.
> 
> This is a micro optimization, about 0.6% PPS performance improvement
> has been observed on a single Cortex-A53 CPU core with 64 bytes UDP RX
> traffic test.
> 
> Before this patch:
> The average of packets per second is 234026 in one minute.
> 
> After this patch:
> The average of packets per second is 235537 in one minute.
> 
> Signed-off-by: Furong Xu <0x1207@...il.com>
> ---
> V2 -> V3: Add more details about measurement in commit message
> V2: https://lore.kernel.org/r/20250103082814.3850096-1-0x1207@gmail.com
> 
> V1 -> V2: Add measurement data about performance improvement in commit message
> V1: https://lore.kernel.org/r/20241010114019.1734573-1-0x1207@gmail.com
> ---
>  net/core/page_pool.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 9733206d6406..9bb2d2300d0b 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -458,7 +458,7 @@ page_pool_dma_sync_for_device(const struct page_pool *pool,
>  			      netmem_ref netmem,
>  			      u32 dma_sync_size)
>  {
> -	if (pool->dma_sync && dma_dev_need_sync(pool->p.dev))
> +	if (pool->dma_sync && dma_dev_need_sync(pool->p.dev) && dma_sync_size)

page_pool_dma_sync_for_device() with dma_sync_size == 0, but with
pool->dma_sync set is VERY uncommon case. In general, this would happen
only when the device didn't write anything to the buffer.
IOW, this "shortcut" would only help *slowpath* code a bit, but
potentially harming really hot functions. Such hot inline helpers are
designed to make code paths which get executed in 99.999% times faster,
while we don't care about the rest 0.001%.
I dunno how did you get this +0.6%, but if your driver makes Page Pool
call sync_for_device(0) too often, the problem is in your driver.

>  		__page_pool_dma_sync_for_device(pool, netmem, dma_sync_size);
>  }

Thanks,
Olek

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ