netdev - Re: [PATCH][v2] page_pool: handle page recycle for NUMA_NO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <831ed886842c894f7b2ffe83fe34705180a86b3b.camel@mellanox.com>
Date:   Wed, 11 Dec 2019 21:24:23 +0000
From:   Saeed Mahameed <saeedm@...lanox.com>
To:     "brouer@...hat.com" <brouer@...hat.com>
CC:     "ilias.apalodimas@...aro.org" <ilias.apalodimas@...aro.org>,
        "jonathan.lemon@...il.com" <jonathan.lemon@...il.com>,
        "linyunsheng@...wei.com" <linyunsheng@...wei.com>,
        Li Rongqing <lirongqing@...du.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH][v2] page_pool: handle page recycle for NUMA_NO_NODE
 condition

On Wed, 2019-12-11 at 19:49 +0100, Jesper Dangaard Brouer wrote:
> On Sat, 7 Dec 2019 03:52:41 +0000
> Saeed Mahameed <saeedm@...lanox.com> wrote:
> 
> > I don't think it is correct to check that the page nid is same as
> > numa_mem_id() if pool is NUMA_NO_NODE. In such case we should allow
> > all
> > pages to recycle, because you can't assume where pages are
> > allocated
> > from and where they are being handled.
> 
> I agree, using numa_mem_id() is not valid, because it takes the numa
> node id from the executing CPU and the call to __page_pool_put_page()
> can happen on a remote CPU (e.g. cpumap redirect, and in future
> SKBs).
> 
> 
> > I suggest the following:
> > 
> > return !page_pfmemalloc() && 
> > ( page_to_nid(page) == pool->p.nid || pool->p.nid == NUMA_NO_NODE
> > );
> 
> Above code doesn't generate optimal ASM code, I suggest:
> 
>  static bool pool_page_reusable(struct page_pool *pool, struct page
> *page)
>  {
> 	return !page_is_pfmemalloc(page) &&
> 		pool->p.nid != NUMA_NO_NODE &&
> 		page_to_nid(page) == pool->p.nid;
>  }
> 

this is not equivalent to the above. Here in case pool->p.nid is
NUMA_NO_NODE, pool_page_reusable() will always be false.

We can avoid the extra check in data path.
How about avoiding NUMA_NO_NODE in page_pool altogether, and force
numa_mem_id() as pool->p.nid when user requests NUMA_NO_NODE at page
pool init, as already done in alloc_pages_node(). 

which will imply recycling without adding any extra condition to the
data path.

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index a6aefe989043..00c99282a306 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -28,6 +28,9 @@ static int page_pool_init(struct page_pool *pool,
 
        memcpy(&pool->p, params, sizeof(pool->p));
 
+	/* overwrite to allow recycling.. */
+       if (pool->p.nid == NUMA_NO_NODE) 
+               pool->p.nid = numa_mem_id(); 
+

After a quick look, i don't see any reason why to keep NUMA_NO_NODE in
pool->p.nid.. 


> I have compiled different variants and looked at the A
> SM code generated
> by GCC.  This seems to give the best result.
> 
> 
> > 1) never recycle emergency pages, regardless of pool nid.
> > 2) always recycle if pool is NUMA_NO_NODE.
> 
> Yes, this defines the semantics, that a page_pool configured with
> NUMA_NO_NODE means skip NUMA checks.  I think that sounds okay...
> 
> 
> > the above change should not add any overhead, a modest branch
> > predictor will handle this with no effort.
> 
> It still annoys me that we keep adding instructions to this code
> hot-path (I counted 34 bytes and 11 instructions in my proposed
> function).
> 
> I think that it might be possible to move these NUMA checks to
> alloc-side (instead of return/recycles side as today), and perhaps
> only
> on slow-path when dequeuing from ptr_ring (as recycles that call
> __page_pool_recycle_direct() will be pinned during NAPI).  But lets
> focus on a smaller fix for the immediate issue...
> 

I know. It annoys me too, but we need recycling to work in production :
where rings/napi can migrate and numa nodes can be NUMA_NO_NODE :-(.