netdev - RE: [EXT] [BUG] Possible unsafe page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: 
 <MWHPR1801MB1918F1D7686BDBC8817E473FD31CA@MWHPR1801MB1918.namprd18.prod.outlook.com>
Date: Wed, 23 Aug 2023 12:28:58 +0000
From: Ratheesh Kannoth <rkannoth@...vell.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC: "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet
	<edumazet@...gle.com>,
        Geethasowjanya Akula <gakula@...vell.com>,
        Ilias
 Apalodimas <ilias.apalodimas@...aro.org>,
        Jakub Kicinski <kuba@...nel.org>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Subbaraya Sundeep Bhatta <sbhatta@...vell.com>,
        Sunil Kovvuri Goutham
	<sgoutham@...vell.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Hariprasad
 Kelam <hkelam@...vell.com>
Subject: RE: [EXT] [BUG] Possible unsafe page_pool usage in octeontx2

> From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> Sent: Wednesday, August 23, 2023 3:18 PM
> Subject: [EXT] [BUG] Possible unsafe page_pool usage in octeontx2
> 
> This breaks in octeontx2 where a worker is used to fill the buffer:
>   otx2_pool_refill_task() -> otx2_alloc_rbuf() -> __otx2_alloc_rbuf() ->
>   otx2_alloc_pool_buf() -> page_pool_alloc_frag().
>
Thanks. I agree.  This path is called in octeontx2 driver only if __otx2_alloc_rbuf() function returns error In below path.
otx2_napi_handler() -->  pfvf->hw_ops->refill_pool_ptrs() ---> cn10k_refill_pool_ptrs() --> otx2_alloc_buffer() --> __otx2_alloc_rbuf(). 
As I understand, the problem is due to workqueue may get scheduled on other CPU. If we use BOUND workqueue, do you think this problem can be solved ?
 
> BH is disabled but the add of a page can still happen while NAPI callback runs
> on a remote CPU and so corrupting the index/ array.
> 
> API wise I would suggest to
> 
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c index
> 7ff80b80a6f9f..b50e219470a36 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -612,7 +612,7 @@ __page_pool_put_page(struct page_pool *pool,
> struct page *page,
>  			page_pool_dma_sync_for_device(pool, page,
>  						      dma_sync_size);
> 
> -		if (allow_direct && in_softirq() &&
> +		if (allow_direct && in_serving_softirq() &&
>  		    page_pool_recycle_in_cache(page, pool))
>  			return NULL;
> 
> because the intention (as I understand it) is to be invoked from within the
> NAPI callback (while softirq is served) and not if BH is just disabled due to a
> lock or so.
Could you help me understand where in_softirq() check will break ?  If  we TX a packet (dev_queue_xmit()) in 
Process context on same core,  in_serving_softirq() check will prevent it from recycling ?