lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 15 Feb 2022 16:41:31 +0100
From:   Jesper Dangaard Brouer <jbrouer@...hat.com>
To:     Joe Damato <jdamato@...tly.com>, netdev@...r.kernel.org,
        kuba@...nel.org, ilias.apalodimas@...aro.org, davem@...emloft.net,
        hawk@...nel.org, saeed@...nel.org, ttoukan.linux@...il.com
Cc:     brouer@...hat.com
Subject: Re: [net-next v5 1/2] page_pool: Add page_pool stat counters


On 14/02/2022 21.02, Joe Damato wrote:
> Add per-cpu per-pool statistics counters for the allocation path of a page
> pool.
> 
> This code is disabled by default and a kernel config option is provided for
> users who wish to enable them.
> 
> The statistics added are:
> 	- fast: successful fast path allocations
> 	- slow: slow path order-0 allocations
> 	- slow_high_order: slow path high order allocations
> 	- empty: ptr ring is empty, so a slow path allocation was forced.
> 	- refill: an allocation which triggered a refill of the cache
> 	- waive: pages obtained from the ptr ring that cannot be added to
> 	  the cache due to a NUMA mismatch.
> 
> Signed-off-by: Joe Damato <jdamato@...tly.com>
> ---
>   include/net/page_pool.h | 18 ++++++++++++++++++
>   net/Kconfig             | 13 +++++++++++++
>   net/core/page_pool.c    | 37 +++++++++++++++++++++++++++++++++----
>   3 files changed, 64 insertions(+), 4 deletions(-)
> 
> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index 97c3c19..d827ab1 100644
> --- a/include/net/page_pool.h
> +++ b/include/net/page_pool.h
> @@ -135,7 +135,25 @@ struct page_pool {
>   	refcount_t user_cnt;
>   
>   	u64 destroy_cnt;
> +#ifdef CONFIG_PAGE_POOL_STATS
> +	struct page_pool_stats __percpu *stats;
> +#endif
> +};

You still have to consider cache-line locality, as I have pointed out 
before.
This placement is wrong!

Output from pahole:

  /* --- cacheline 23 boundary (1472 bytes) --- */
  atomic_t                   pages_state_release_cnt; /*  1472     4 */
  refcount_t                 user_cnt;             /*  1476     4 */
  u64                        destroy_cnt;          /*  1480     8 */

Your *stats pointer end-up on a cache-line that "remote" CPUs will write 
into (pages_state_release_cnt).
This is why we see a slowdown to the 'bench_page_pool_cross_cpu' test.

--Jesper

Powered by blists - more mailing lists