linux-kernel - Re: [PATCH net-next v7 2/8] page_pool: fix timing for checking and disabling napi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <84282526-6229-41c7-8f6b-5f2c500dcd8e@gmail.com>
Date: Sat, 25 Jan 2025 22:21:31 +0800
From: Yunsheng Lin <yunshenglin0825@...il.com>
To: Toke Høiland-Jørgensen <toke@...hat.com>,
 Yunsheng Lin <linyunsheng@...wei.com>, davem@...emloft.net, kuba@...nel.org,
 pabeni@...hat.com
Cc: zhangkun09@...wei.com, liuyonglong@...wei.com, fanghaiqing@...wei.com,
 Alexander Lobakin <aleksander.lobakin@...el.com>,
 Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
 Jesper Dangaard Brouer <hawk@...nel.org>,
 Ilias Apalodimas <ilias.apalodimas@...aro.org>,
 Eric Dumazet <edumazet@...gle.com>, Simon Horman <horms@...nel.org>,
 netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v7 2/8] page_pool: fix timing for checking and
 disabling napi_local

On 1/25/2025 1:13 AM, Toke Høiland-Jørgensen wrote:
> Yunsheng Lin <linyunsheng@...wei.com> writes:
> 
>>> So I really don't see a way for this race to happen with correct usage
>>> of the page_pool and NAPI APIs, which means there's no reason to make
>>> the change you are proposing here.
>>
>> I looked at one driver setting pp->napi, it seems the bnxt driver doesn't
>> seems to call page_pool_disable_direct_recycling() when unloading, see
>> bnxt_half_close_nic(), page_pool_disable_direct_recycling() seems to be
>> only called for the new queue_mgmt API:
>>
>> /* rtnl_lock held, this call can only be made after a previous successful
>>   * call to bnxt_half_open_nic().
>>   */
>> void bnxt_half_close_nic(struct bnxt *bp)
>> {
>> 	bnxt_hwrm_resource_free(bp, false, true);
>> 	bnxt_del_napi(bp);       *----call napi del and rcu sync----*
>> 	bnxt_free_skbs(bp);
>> 	bnxt_free_mem(bp, true); *------call page_pool_destroy()----*
>> 	clear_bit(BNXT_STATE_HALF_OPEN, &bp->state);
>> }
>>
>> Even if there is a page_pool_disable_direct_recycling() called between
>> bnxt_del_napi() and bnxt_free_mem(), the timing window still exist as
>> rcu sync need to be called after page_pool_disable_direct_recycling(),
>> it seems some refactor is needed for bnxt driver to reuse the rcu sync
>> from the NAPI API, in order to avoid calling the rcu sync for
>> page_pool_destroy().
> 
> Well, I would consider that usage buggy. A page pool object is created
> with a reference to the napi struct; so the page pool should also be
> destroyed (clearing its reference) before the napi memory is freed. I
> guess this is not really documented anywhere, but it's pretty standard
> practice to free objects in the opposite order of their creation.

I am not so familiar with rule about the creation API of NAPI, but the
implementation of bnxt driver can have reference of 'struct napi' before
calling netif_napi_add(), see below:

static int __bnxt_open_nic(struct bnxt *bp, bool irq_re_init, bool 
link_re_init)
{
	.......
	rc = bnxt_alloc_mem(bp, irq_re_init);     *create page_pool*
	if (rc) {
		netdev_err(bp->dev, "bnxt_alloc_mem err: %x\n", rc);
		goto open_err_free_mem;
	}

	if (irq_re_init) {
		bnxt_init_napi(bp);                *netif_napi_add*
		rc = bnxt_request_irq(bp);
		if (rc) {
			netdev_err(bp->dev, "bnxt_request_irq err: %x\n", rc);
			goto open_err_irq;
		}
	}

	.....
}

> 
> So no, I don't think this is something that should be fixed on the page
> pool side (and certainly not by adding another synchronize_rcu() call
> per queue!); rather, we should fix the drivers that get this wrong (and
> probably document the requirement a bit better).

Even if timing problem of checking and disabling napi_local should not
be fixed on the page_pool side, do we have some common understanding
about fixing the DMA API misuse problem on the page_pool side?
If yes, do we have some common understanding about some mechanism
like synchronize_rcu() might be still needed on the page_pool side?

If no, I am not sure if there is still any better about how to fix
the DMA API misuse problem after all the previous discussion?

If yes, it may be better to focus on discussing how to avoid calling rcu
sync for each queue mentioned in [1].

1. 
https://lore.kernel.org/all/22de6033-744e-486e-bbd9-8950249cd018@huawei.com/