lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241205164233.64512141@kernel.org>
Date: Thu, 5 Dec 2024 16:42:33 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Yunsheng Lin <linyunsheng@...wei.com>
Cc: <davem@...emloft.net>, <pabeni@...hat.com>, <liuyonglong@...wei.com>,
 <fanghaiqing@...wei.com>, <zhangkun09@...wei.com>, Alexander Lobakin
 <aleksander.lobakin@...el.com>, Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
 Jesper Dangaard Brouer <hawk@...nel.org>, Ilias Apalodimas
 <ilias.apalodimas@...aro.org>, Eric Dumazet <edumazet@...gle.com>, Simon
 Horman <horms@...nel.org>, <netdev@...r.kernel.org>,
 <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH RFC v4 1/3] page_pool: fix timing for checking and
 disabling napi_local

On Thu, 5 Dec 2024 19:43:25 +0800 Yunsheng Lin wrote:
> It depends on what is the callers is trying to protect by calling
> page_pool_disable_direct_recycling().
> 
> It seems the use case for the only user of the API in bnxt driver
> is about reuseing the same NAPI for different page_pool instances.
> 
> According to the steps in netdev_rx_queue.c:
> 1. allocate new queue memory & create page_pool
> 2. stop old rx queue.
> 3. start new rx queue with new page_pool
> 4. free old queue memory + destroy page_pool.
> 
> The page_pool_disable_direct_recycling() is called in step 2, I am
> not sure how napi_enable() & napi_disable() are called in the above
> flow, but it seems there is no use-after-free problem this patch is
> trying to fix for the above flow.
> 
> It doesn't seems to have any concurrent access problem if napi->list_owner
> is set to -1 before napi_disable() returns and the napi_enable() for the
> new queue is called after page_pool_disable_direct_recycling() is called
> in step 2.

The fix is presupposing there is long delay between fetching of
the NAPI pointer and its access. The concern is that NAPI gets
restarted in step 3 after we already READ_ONCE()'ed the pointer,
then we access it and judge it to be running on the same core.
Then we put the page into the fast cache which will never get
flushed.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ