[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<MWHPR1801MB1918F1D7686BDBC8817E473FD31CA@MWHPR1801MB1918.namprd18.prod.outlook.com>
Date: Wed, 23 Aug 2023 12:28:58 +0000
From: Ratheesh Kannoth <rkannoth@...vell.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC: "David S. Miller" <davem@...emloft.net>,
Eric Dumazet
<edumazet@...gle.com>,
Geethasowjanya Akula <gakula@...vell.com>,
Ilias
Apalodimas <ilias.apalodimas@...aro.org>,
Jakub Kicinski <kuba@...nel.org>,
Jesper Dangaard Brouer <hawk@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Subbaraya Sundeep Bhatta <sbhatta@...vell.com>,
Sunil Kovvuri Goutham
<sgoutham@...vell.com>,
Thomas Gleixner <tglx@...utronix.de>,
Hariprasad
Kelam <hkelam@...vell.com>
Subject: RE: [EXT] [BUG] Possible unsafe page_pool usage in octeontx2
> From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> Sent: Wednesday, August 23, 2023 3:18 PM
> Subject: [EXT] [BUG] Possible unsafe page_pool usage in octeontx2
>
> This breaks in octeontx2 where a worker is used to fill the buffer:
> otx2_pool_refill_task() -> otx2_alloc_rbuf() -> __otx2_alloc_rbuf() ->
> otx2_alloc_pool_buf() -> page_pool_alloc_frag().
>
Thanks. I agree. This path is called in octeontx2 driver only if __otx2_alloc_rbuf() function returns error In below path.
otx2_napi_handler() --> pfvf->hw_ops->refill_pool_ptrs() ---> cn10k_refill_pool_ptrs() --> otx2_alloc_buffer() --> __otx2_alloc_rbuf().
As I understand, the problem is due to workqueue may get scheduled on other CPU. If we use BOUND workqueue, do you think this problem can be solved ?
> BH is disabled but the add of a page can still happen while NAPI callback runs
> on a remote CPU and so corrupting the index/ array.
>
> API wise I would suggest to
>
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c index
> 7ff80b80a6f9f..b50e219470a36 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -612,7 +612,7 @@ __page_pool_put_page(struct page_pool *pool,
> struct page *page,
> page_pool_dma_sync_for_device(pool, page,
> dma_sync_size);
>
> - if (allow_direct && in_softirq() &&
> + if (allow_direct && in_serving_softirq() &&
> page_pool_recycle_in_cache(page, pool))
> return NULL;
>
> because the intention (as I understand it) is to be invoked from within the
> NAPI callback (while softirq is served) and not if BH is just disabled due to a
> lock or so.
Could you help me understand where in_softirq() check will break ? If we TX a packet (dev_queue_xmit()) in
Process context on same core, in_serving_softirq() check will prevent it from recycling ?
Powered by blists - more mailing lists