linux-kernel - Re: [PATCH RFC v4 1/3] page_pool: fix timing for checking and disabling napi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <87a40a3f-1f96-4a21-a546-f057e78bd44f@gmail.com>
Date: Sat, 7 Dec 2024 13:52:11 +0800
From: Yunsheng Lin <yunshenglin0825@...il.com>
To: Jakub Kicinski <kuba@...nel.org>, Yunsheng Lin <linyunsheng@...wei.com>
Cc: davem@...emloft.net, pabeni@...hat.com, liuyonglong@...wei.com,
 fanghaiqing@...wei.com, zhangkun09@...wei.com,
 Alexander Lobakin <aleksander.lobakin@...el.com>,
 Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
 Jesper Dangaard Brouer <hawk@...nel.org>,
 Ilias Apalodimas <ilias.apalodimas@...aro.org>,
 Eric Dumazet <edumazet@...gle.com>, Simon Horman <horms@...nel.org>,
 netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
 David Wei <dw@...idwei.uk>, Shailend Chand <shailend@...gle.com>,
 Michael Chan <michael.chan@...adcom.com>
Subject: Re: [PATCH RFC v4 1/3] page_pool: fix timing for checking and
 disabling napi_local

On 12/7/2024 12:09 AM, Jakub Kicinski wrote:

...

>>
>> It seems the napi_disable() is called before netdev_rx_queue_restart()
>> and napi_enable() and ____napi_schedule() are called after
>> netdev_rx_queue_restart() as there is no napi API called in the
>> implementation of 'netdev_queue_mgmt_ops' for bnxt driver?
>>
>> If yes, napi->list_owner is set to -1 before step 1 and only set to
>> a valid cpu in step 6 as below:
>> 1. napi_disable()
>> 2. allocate new queue memory & create new page_pool.
>> 3. stop old rx queue.
>> 4. start new rx queue with new page_pool.
>> 5. free old queue memory + destroy old page_pool.
>> 6. napi_enable() & ____napi_schedule()
>>
>> And there are at least three flows involved here:
>> flow 1: calling napi_complete_done() and set napi->list_owner to -1.
>> flow 2: calling netdev_rx_queue_restart().
>> flow 3: calling skb_defer_free_flush() with the page belonging to the old
>>         page_pool.
>>
>> The only case of page_pool_napi_local() returning true in flow 3 I can
>> think of is that flow 1 and flow 3 might need to be called in the softirq
>> of the same CPU and flow 3 might need to be called before flow 1.
>>
>> It seems impossible that page_pool_napi_local() will return true between
>> step 1 and step 6 as updated napi->list_owner is always seen by flow 3
>> when they are both called in the softirq context of the same CPU or
>> napi->list_owner != CPU that calling flow 3, which seems like an implicit
>> assumption for the case of napi scheduling between different cpus too.
>>
>> And old page_pool is destroyed in step 5, I am not sure if it is necessary
>> to call page_pool_disable_direct_recycling() in step 3 if page_pool_destroy()
>> already have the synchronize_rcu() in step 5 before enabling napi.
>>
>> If not, maybe I am missing something here.
> 
> Yes, I believe you got the steps 5 and 6 backwards.

Maybe, but I am not sure how is it possible that step 6 is called before
step 5 yet.
As it seems two drivers implement 'netdev_queue_mgmt_ops' now and
only bnxt calls page_pool_disable_direct_recycling(), and its
implementation doesn't call napi related API, see bnxt_queue_mgmt_ops:
https://elixir.bootlin.com/linux/v6.13-rc1/source/drivers/net/ethernet/broadcom/bnxt/bnxt.c#L15539

And netdev_rx_queue_restart() seems to call the above ops without
calling any napi related API:
https://elixir.bootlin.com/linux/v6.12.3/source/net/core/netdev_rx_queue.c#L9

The napi related API seems to be only called in bnxt_open_nic() and
bnxt_close_nic() in bnxt driver, and they don't seems to be related
directly to the queue_mgmt_ops.

+cc relevant author and maintainer to see if there is some clarifying
from them as I am not really similar with queue mgmt related sequence.