[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2aa84c61-6531-4f17-89e5-101f46ef00d0@huawei.com>
Date: Wed, 22 Jan 2025 19:02:29 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Toke Høiland-Jørgensen <toke@...hat.com>, Yunsheng
Lin <yunshenglin0825@...il.com>, <davem@...emloft.net>, <kuba@...nel.org>,
<pabeni@...hat.com>
CC: <zhangkun09@...wei.com>, <liuyonglong@...wei.com>,
<fanghaiqing@...wei.com>, Alexander Lobakin <aleksander.lobakin@...el.com>,
Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, Jesper Dangaard Brouer
<hawk@...nel.org>, Ilias Apalodimas <ilias.apalodimas@...aro.org>, Eric
Dumazet <edumazet@...gle.com>, Simon Horman <horms@...nel.org>,
<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next v7 2/8] page_pool: fix timing for checking and
disabling napi_local
On 2025/1/20 19:24, Toke Høiland-Jørgensen wrote:
...
>>>>
>>>> void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem,
>>>> @@ -1165,6 +1172,12 @@ void page_pool_destroy(struct page_pool *pool)
>>>> if (!page_pool_release(pool))
>>>> return;
>>>>
>>>> + /* Paired with rcu lock in page_pool_napi_local() to enable clearing
>>>> + * of pool->p.napi in page_pool_disable_direct_recycling() is seen
>>>> + * before returning to driver to free the napi instance.
>>>> + */
>>>> + synchronize_rcu();
>>>
>>> Most drivers call page_pool_destroy() in a loop for each RX queue, so
>>> now you're introducing a full synchronize_rcu() wait for each queue.
>>> That can delay tearing down the device significantly, so I don't think
>>> this is a good idea.
>>
>> synchronize_rcu() is called after page_pool_release(pool), which means
>> it is only called when there are some inflight pages, so there is not
>> necessarily a full synchronize_rcu() wait for each queue.
>>
>> Anyway, it seems that there are some cases that need explicit
>> synchronize_rcu() and some cases depending on the other API providing
>> synchronize_rcu() semantics, maybe we provide two diffferent API for
>> both cases like the netif_napi_del()/__netif_napi_del() APIs do?
>
> I don't think so. This race can only be triggered if:
>
> - An skb is allocated from a page_pool with a napi instance attached
>
> - That skb is freed *in softirq context* while the memory backing the
> NAPI instance is being freed.
>
> It's only valid to free a napi instance after calling netif_napi_del(),
> which does a full synchronise_rcu(). This means that any running
> softirqs will have exited at this point, and all packets will have been
> flushed from the deferred freeing queues. And since the NAPI has been
> stopped at this point, no new packets can enter the deferred freeing
> queue from that NAPI instance.
Note that the skb_defer_free_flush() can be called without bounding to
any NAPI instance, see the skb_defer_free_flush() called by net_rx_action(),
which means the packets from that NAPI instance can still be called in
the softirq context even when the NAPI has been stopped.
>
> So I really don't see a way for this race to happen with correct usage
> of the page_pool and NAPI APIs, which means there's no reason to make
> the change you are proposing here.
I looked at one driver setting pp->napi, it seems the bnxt driver doesn't
seems to call page_pool_disable_direct_recycling() when unloading, see
bnxt_half_close_nic(), page_pool_disable_direct_recycling() seems to be
only called for the new queue_mgmt API:
/* rtnl_lock held, this call can only be made after a previous successful
* call to bnxt_half_open_nic().
*/
void bnxt_half_close_nic(struct bnxt *bp)
{
bnxt_hwrm_resource_free(bp, false, true);
bnxt_del_napi(bp); *----call napi del and rcu sync----*
bnxt_free_skbs(bp);
bnxt_free_mem(bp, true); *------call page_pool_destroy()----*
clear_bit(BNXT_STATE_HALF_OPEN, &bp->state);
}
Even if there is a page_pool_disable_direct_recycling() called between
bnxt_del_napi() and bnxt_free_mem(), the timing window still exist as
rcu sync need to be called after page_pool_disable_direct_recycling(),
it seems some refactor is needed for bnxt driver to reuse the rcu sync
from the NAPI API, in order to avoid calling the rcu sync for
page_pool_destroy().
>
> -Toke
>
>
Powered by blists - more mailing lists