[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ed575c4e-774a-2118-f6bf-c8725d2739e8@redhat.com>
Date: Tue, 15 Feb 2022 16:41:31 +0100
From: Jesper Dangaard Brouer <jbrouer@...hat.com>
To: Joe Damato <jdamato@...tly.com>, netdev@...r.kernel.org,
kuba@...nel.org, ilias.apalodimas@...aro.org, davem@...emloft.net,
hawk@...nel.org, saeed@...nel.org, ttoukan.linux@...il.com
Cc: brouer@...hat.com
Subject: Re: [net-next v5 1/2] page_pool: Add page_pool stat counters
On 14/02/2022 21.02, Joe Damato wrote:
> Add per-cpu per-pool statistics counters for the allocation path of a page
> pool.
>
> This code is disabled by default and a kernel config option is provided for
> users who wish to enable them.
>
> The statistics added are:
> - fast: successful fast path allocations
> - slow: slow path order-0 allocations
> - slow_high_order: slow path high order allocations
> - empty: ptr ring is empty, so a slow path allocation was forced.
> - refill: an allocation which triggered a refill of the cache
> - waive: pages obtained from the ptr ring that cannot be added to
> the cache due to a NUMA mismatch.
>
> Signed-off-by: Joe Damato <jdamato@...tly.com>
> ---
> include/net/page_pool.h | 18 ++++++++++++++++++
> net/Kconfig | 13 +++++++++++++
> net/core/page_pool.c | 37 +++++++++++++++++++++++++++++++++----
> 3 files changed, 64 insertions(+), 4 deletions(-)
>
> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index 97c3c19..d827ab1 100644
> --- a/include/net/page_pool.h
> +++ b/include/net/page_pool.h
> @@ -135,7 +135,25 @@ struct page_pool {
> refcount_t user_cnt;
>
> u64 destroy_cnt;
> +#ifdef CONFIG_PAGE_POOL_STATS
> + struct page_pool_stats __percpu *stats;
> +#endif
> +};
You still have to consider cache-line locality, as I have pointed out
before.
This placement is wrong!
Output from pahole:
/* --- cacheline 23 boundary (1472 bytes) --- */
atomic_t pages_state_release_cnt; /* 1472 4 */
refcount_t user_cnt; /* 1476 4 */
u64 destroy_cnt; /* 1480 8 */
Your *stats pointer end-up on a cache-line that "remote" CPUs will write
into (pages_state_release_cnt).
This is why we see a slowdown to the 'bench_page_pool_cross_cpu' test.
--Jesper
Powered by blists - more mailing lists