linux-kernel - Re: [PATCH][v2] page_pool: handle page recycle for NUMA_NO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 12 Dec 2019 09:34:14 +0800
From:   Yunsheng Lin <linyunsheng@...wei.com>
To:     Saeed Mahameed <saeedm@...lanox.com>,
        "brouer@...hat.com" <brouer@...hat.com>
CC:     "ilias.apalodimas@...aro.org" <ilias.apalodimas@...aro.org>,
        "jonathan.lemon@...il.com" <jonathan.lemon@...il.com>,
        Li Rongqing <lirongqing@...du.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        <mhocko@...nel.org>, <peterz@...radead.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        <bhelgaas@...gle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH][v2] page_pool: handle page recycle for NUMA_NO_NODE
 condition

+CC Michal, Peter, Greg and Bjorn
Because there has been disscusion about where and how the NUMA_NO_NODE
should be handled before.

On 2019/12/12 5:24, Saeed Mahameed wrote:
> On Wed, 2019-12-11 at 19:49 +0100, Jesper Dangaard Brouer wrote:
>> On Sat, 7 Dec 2019 03:52:41 +0000
>> Saeed Mahameed <saeedm@...lanox.com> wrote:
>>
>>> I don't think it is correct to check that the page nid is same as
>>> numa_mem_id() if pool is NUMA_NO_NODE. In such case we should allow
>>> all
>>> pages to recycle, because you can't assume where pages are
>>> allocated
>>> from and where they are being handled.
>>
>> I agree, using numa_mem_id() is not valid, because it takes the numa
>> node id from the executing CPU and the call to __page_pool_put_page()
>> can happen on a remote CPU (e.g. cpumap redirect, and in future
>> SKBs).
>>
>>
>>> I suggest the following:
>>>
>>> return !page_pfmemalloc() && 
>>> ( page_to_nid(page) == pool->p.nid || pool->p.nid == NUMA_NO_NODE
>>> );
>>
>> Above code doesn't generate optimal ASM code, I suggest:
>>
>>  static bool pool_page_reusable(struct page_pool *pool, struct page
>> *page)
>>  {
>> 	return !page_is_pfmemalloc(page) &&
>> 		pool->p.nid != NUMA_NO_NODE &&
>> 		page_to_nid(page) == pool->p.nid;
>>  }
>>
> 
> this is not equivalent to the above. Here in case pool->p.nid is
> NUMA_NO_NODE, pool_page_reusable() will always be false.
> 
> We can avoid the extra check in data path.
> How about avoiding NUMA_NO_NODE in page_pool altogether, and force
> numa_mem_id() as pool->p.nid when user requests NUMA_NO_NODE at page
> pool init, as already done in alloc_pages_node(). 

That means we will not support page reuse migragtion for NUMA_NO_NODE,
which is not same semantic that alloc_pages_node() handle NUMA_NO_NODE,
because alloc_pages_node() will allocate the page based on the node
of the current running cpu.

Also, There seems to be a wild guessing of the node id here, which has
been disscussed before and has not reached a agreement yet.

> 
> which will imply recycling without adding any extra condition to the
> data path.
> 
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index a6aefe989043..00c99282a306 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -28,6 +28,9 @@ static int page_pool_init(struct page_pool *pool,
>  
>         memcpy(&pool->p, params, sizeof(pool->p));
>  
> +	/* overwrite to allow recycling.. */
> +       if (pool->p.nid == NUMA_NO_NODE) 
> +               pool->p.nid = numa_mem_id(); 
> +
> 
> After a quick look, i don't see any reason why to keep NUMA_NO_NODE in
> pool->p.nid.. 
> 
> 
>> I have compiled different variants and looked at the A
>> SM code generated
>> by GCC.  This seems to give the best result.
>>
>>
>>> 1) never recycle emergency pages, regardless of pool nid.
>>> 2) always recycle if pool is NUMA_NO_NODE.
>>
>> Yes, this defines the semantics, that a page_pool configured with
>> NUMA_NO_NODE means skip NUMA checks.  I think that sounds okay...
>>
>>
>>> the above change should not add any overhead, a modest branch
>>> predictor will handle this with no effort.
>>
>> It still annoys me that we keep adding instructions to this code
>> hot-path (I counted 34 bytes and 11 instructions in my proposed
>> function).
>>
>> I think that it might be possible to move these NUMA checks to
>> alloc-side (instead of return/recycles side as today), and perhaps
>> only
>> on slow-path when dequeuing from ptr_ring (as recycles that call
>> __page_pool_recycle_direct() will be pinned during NAPI).  But lets
>> focus on a smaller fix for the immediate issue...
>>
> 
> I know. It annoys me too, but we need recycling to work in production :
> where rings/napi can migrate and numa nodes can be NUMA_NO_NODE :-(.
> 
>