lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 9 Dec 2019 09:31:16 +0800
From:   Yunsheng Lin <linyunsheng@...wei.com>
To:     Saeed Mahameed <saeedm@...lanox.com>,
        "jonathan.lemon@...il.com" <jonathan.lemon@...il.com>,
        Li Rongqing <lirongqing@...du.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "brouer@...hat.com" <brouer@...hat.com>,
        "ilias.apalodimas@...aro.org" <ilias.apalodimas@...aro.org>
Subject: Re: [PATCH][v2] page_pool: handle page recycle for NUMA_NO_NODE
 condition

On 2019/12/7 11:52, Saeed Mahameed wrote:
> On Fri, 2019-12-06 at 17:32 +0800, Li RongQing wrote:
>> some drivers uses page pool, but not require to allocate
>> pages from bound node, or simply assign pool.p.nid to
>> NUMA_NO_NODE, and the commit d5394610b1ba ("page_pool:
>> Don't recycle non-reusable pages") will block this kind
>> of driver to recycle
>>
>> so take page as reusable when page belongs to current
>> memory node if nid is NUMA_NO_NODE
>>
>> v1-->v2: add check with numa_mem_id from Yunsheng
>>
>> Fixes: d5394610b1ba ("page_pool: Don't recycle non-reusable pages")
>> Signed-off-by: Li RongQing <lirongqing@...du.com>
>> Suggested-by: Yunsheng Lin <linyunsheng@...wei.com>
>> Cc: Saeed Mahameed <saeedm@...lanox.com>
>> ---
>>  net/core/page_pool.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>> index a6aefe989043..3c8b51ccd1c1 100644
>> --- a/net/core/page_pool.c
>> +++ b/net/core/page_pool.c
>> @@ -312,12 +312,17 @@ static bool __page_pool_recycle_direct(struct
>> page *page,
>>  /* page is NOT reusable when:
>>   * 1) allocated when system is under some pressure.
>> (page_is_pfmemalloc)
>>   * 2) belongs to a different NUMA node than pool->p.nid.
>> + * 3) belongs to a different memory node than current context
>> + * if pool->p.nid is NUMA_NO_NODE
>>   *
>>   * To update pool->p.nid users must call page_pool_update_nid.
>>   */
>>  static bool pool_page_reusable(struct page_pool *pool, struct page
>> *page)
>>  {
>> -	return !page_is_pfmemalloc(page) && page_to_nid(page) == pool-
>>> p.nid;
>> +	return !page_is_pfmemalloc(page) &&
>> +		(page_to_nid(page) == pool->p.nid ||
>> +		(pool->p.nid == NUMA_NO_NODE &&
>> +		page_to_nid(page) == numa_mem_id()));
>>  }
>>  
> 
> Cc'ed Jesper, Ilias & Jonathan.
> 
> I don't think it is correct to check that the page nid is same as
> numa_mem_id() if pool is NUMA_NO_NODE. In such case we should allow all
> pages to recycle, because you can't assume where pages are allocated
> from and where they are being handled.
> 
> I suggest the following:
> 
> return !page_pfmemalloc() && 
> ( page_to_nid(page) == pool->p.nid || pool->p.nid == NUMA_NO_NODE );
> 
> 1) never recycle emergency pages, regardless of pool nid.
> 2) always recycle if pool is NUMA_NO_NODE.

As I can see, below are the cases that the pool->p.nid could be NUMA_NO_NODE:

1. kernel built with the CONFIG_NUMA being off.

2. kernel built with the CONFIG_NUMA being on, but FW/BIOS dose not provide
   a valid node id through ACPI/DT, and it has the below cases:

   a). the hardware itself is single numa node system, so maybe it is valid
       to not provide a valid node for the device.
   b). the hardware itself is multi numa nodes system, and the FW/BIOS is
       broken that it does not provide a valid one.

3. kernel built with the CONFIG_NUMA being on, and FW/BIOS dose provide a
   valid node id through ACPI/DT, but the driver does not pass the valid
   node id when calling page_pool_init().

I am not sure which case this patch is trying to fix, maybe Rongqing can
help to clarify.

For case 1 and case 2 (a), I suppose checking pool->p.nid being NUMA_NO_NODE
is enough.

For case 2 (b), There may be argument that it should be fixed in the BIOS/FW,
But it also can be argued that the numa_mem_id() checking has been done in the
driver that has not using page pool yet when deciding whether to do recycling,
see [1]. If I understanding correctly, recycling is normally done at the NAPI
pooling, which has the same affinity as the rx interrupt, and rx interrupt is
not changed very often. If it does change to different memory node, maybe it
makes sense not to recycle the old page belongs to old node?


For case 3, I am not sure if any driver is doing that, and if the page pool API
even allow that?

[1] https://elixir.bootlin.com/linux/latest/ident/numa_mem_id

> 
> the above change should not add any overhead, a modest branch predictor
> will handle this with no effort.
> 
> Jesper et al. what do you think?
> 
> -Saeed.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ