lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191025153353.606e4b0d@carbon>
Date:   Fri, 25 Oct 2019 15:33:53 +0200
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Saeed Mahameed <saeedm@...lanox.com>
Cc:     "David S. Miller" <davem@...emloft.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        "ilias.apalodimas@...aro.org" <ilias.apalodimas@...aro.org>,
        brouer@...hat.com
Subject: Re: [PATCH net-nex V2 2/3] page_pool: Don't recycle non-reusable
 pages

On Wed, 23 Oct 2019 19:37:00 +0000
Saeed Mahameed <saeedm@...lanox.com> wrote:

> A page is NOT reusable when at least one of the following is true:
> 1) allocated when system was under some pressure. (page_is_pfmemalloc)
> 2) belongs to a different NUMA node than pool->p.nid.
> 
> To update pool->p.nid users should call page_pool_update_nid().
> 
> Holding on to such pages in the pool will hurt the consumer performance
> when the pool migrates to a different numa node.
> 
> Performance testing:
> XDP drop/tx rate and TCP single/multi stream, on mlx5 driver
> while migrating rx ring irq from close to far numa:
> 
> mlx5 internal page cache was locally disabled to get pure page pool
> results.

Could you show us the code that disable the local page cache?


> CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
> NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)
> 
> XDP Drop/TX single core:
> NUMA  | XDP  | Before    | After
> ---------------------------------------
> Close | Drop | 11   Mpps | 10.9 Mpps
> Far   | Drop | 4.4  Mpps | 5.8  Mpps
> 
> Close | TX   | 6.5 Mpps  | 6.5 Mpps
> Far   | TX   | 3.5 Mpps  | 4  Mpps
> 
> Improvement is about 30% drop packet rate, 15% tx packet rate for numa
> far test.
> No degradation for numa close tests.
> 
> TCP single/multi cpu/stream:
> NUMA  | #cpu | Before  | After
> --------------------------------------
> Close | 1    | 18 Gbps | 18 Gbps
> Far   | 1    | 15 Gbps | 18 Gbps
> Close | 12   | 80 Gbps | 80 Gbps
> Far   | 12   | 68 Gbps | 80 Gbps
> 
> In all test cases we see improvement for the far numa case, and no
> impact on the close numa case.
> 
> The impact of adding a check per page is very negligible, and shows no
> performance degradation whatsoever, also functionality wise it seems more
> correct and more robust for page pool to verify when pages should be
> recycled, since page pool can't guarantee where pages are coming from.
> 
> Signed-off-by: Saeed Mahameed <saeedm@...lanox.com>
> Acked-by: Jonathan Lemon <jonathan.lemon@...il.com>
> ---
>  net/core/page_pool.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 953af6d414fb..73e4173c4dce 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -283,6 +283,17 @@ static bool __page_pool_recycle_direct(struct page *page,
>  	return true;
>  }
>  
> +/* page is NOT reusable when:
> + * 1) allocated when system is under some pressure. (page_is_pfmemalloc)
> + * 2) belongs to a different NUMA node than pool->p.nid.
> + *
> + * To update pool->p.nid users must call page_pool_update_nid.
> + */
> +static bool pool_page_reusable(struct page_pool *pool, struct page *page)
> +{
> +	return !page_is_pfmemalloc(page) && page_to_nid(page) == pool->p.nid;
> +}
> +
>  void __page_pool_put_page(struct page_pool *pool,
>  			  struct page *page, bool allow_direct)
>  {
> @@ -292,7 +303,8 @@ void __page_pool_put_page(struct page_pool *pool,
>  	 *
>  	 * refcnt == 1 means page_pool owns page, and can recycle it.
>  	 */
> -	if (likely(page_ref_count(page) == 1)) {
> +	if (likely(page_ref_count(page) == 1 &&
> +		   pool_page_reusable(pool, page))) {

I'm afraid that we are slowly chipping away the performance benefit
with these incremental changes, adding more checks. We have an extreme
performance use-case with XDP_DROP, where we want drivers to use this
code path to hit __page_pool_recycle_direct(), that is a simple array
update (protected under NAPI) into pool->alloc.cache[].

To preserve this hot-path, you could instead flush pool->alloc.cache[]
in the call page_pool_update_nid().  And move the pool_page_reusable()
check into __page_pool_recycle_into_ring().  (Below added the '>>' with
remaining code to make this easier to see)


>  		/* Read barrier done in page_ref_count / READ_ONCE */
>  
>  		if (allow_direct && in_serving_softirq())
>>			if (__page_pool_recycle_direct(page, pool))
>>				return;
>>
>>		if (!__page_pool_recycle_into_ring(pool, page)) {
>>			/* Cache full, fallback to free pages */
>>			__page_pool_return_page(pool, page);
>>		}
>>		return;
>>	}
>>	/* Fallback/non-XDP mode: API user have elevated refcnt.


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

For easier review:

/* Only allow direct recycling in special circumstances, into the
 * alloc side cache.  E.g. during RX-NAPI processing for XDP_DROP use-case.
 *
 * Caller must provide appropriate safe context.
 */
static bool __page_pool_recycle_direct(struct page *page,
				       struct page_pool *pool)
{
	if (unlikely(pool->alloc.count == PP_ALLOC_CACHE_SIZE))
		return false;

	/* Caller MUST have verified/know (page_ref_count(page) == 1) */
	pool->alloc.cache[pool->alloc.count++] = page;
	return true;
}


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ