netdev - Re: [BUG] Possible unsafe page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d1f43386-b337-db94-7d9d-d078cd20c927@intel.com>
Date: Mon, 28 Aug 2023 13:07:12 +0200
From: Alexander Lobakin <aleksander.lobakin@...el.com>
To: Jesper Dangaard Brouer <hawk@...nel.org>
CC: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	<netdev@...r.kernel.org>, Ratheesh Kannoth <rkannoth@...vell.com>, "David S.
 Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, "Geetha
 sowjanya" <gakula@...vell.com>, Ilias Apalodimas
	<ilias.apalodimas@...aro.org>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
	<pabeni@...hat.com>, Subbaraya Sundeep <sbhatta@...vell.com>, Sunil Goutham
	<sgoutham@...vell.com>, Thomas Gleixner <tglx@...utronix.de>, hariprasad
	<hkelam@...vell.com>, Qingfang DENG <qingfang.deng@...lower.com.cn>
Subject: Re: [BUG] Possible unsafe page_pool usage in octeontx2

From: Jesper Dangaard Brouer <hawk@...nel.org>
Date: Fri, 25 Aug 2023 19:25:42 +0200

> 
> 
> On 25/08/2023 15.38, Alexander Lobakin wrote:
>> From: Jesper Dangaard Brouer <hawk@...nel.org>
>> Date: Fri, 25 Aug 2023 15:22:05 +0200
>>
>>>
>>>
>>> On 24/08/2023 17.26, Alexander Lobakin wrote:
>>>> From: Jesper Dangaard Brouer<hawk@...nel.org>
>>>> Date: Wed, 23 Aug 2023 21:45:04 +0200
>>>>
>>>>> (Cc Olek as he have changes in this code path)
>>>> Thanks! I was reading the thread a bit on LKML, but being in the CC
>>>> list
>>>> is more convenient :D
>>>>
>>>
>>> :D
>>>
>>>>> On 23/08/2023 11.47, Sebastian Andrzej Siewior wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I've been looking at the page_pool locking.
>>>>>>
>>>>>> page_pool_alloc_frag() -> page_pool_alloc_pages() ->
>>>>>> __page_pool_get_cached():
>>>>>>
>>>>>> There core of the allocation is:
>>>>>> |         /* Caller MUST guarantee safe non-concurrent access, e.g.
>>>>>> softirq */
>>>>>> |         if (likely(pool->alloc.count)) {
>>>>>> |                 /* Fast-path */
>>>>>> |                 page = pool->alloc.cache[--pool->alloc.count];
>>>>>>
>>>>>> The access to the `cache' array and the `count' variable is not
>>>>>> locked.
>>>>>> This is fine as long as there only one consumer per pool. In my
>>>>>> understanding the intention is to have one page_pool per NAPI
>>>>>> callback
>>>>>> to ensure this.
>>>>>>
>>>>> Yes, the intention is a single PP instance is "bound" to one RX-NAPI.
>>>>
>>>> Isn't that also a misuse of page_pool->p.napi? I thought it can be set
>>>> only when page allocation and cache refill happen both inside the same
>>>> NAPI polling function. Otx2 uses workqueues to refill the queues,
>>>> meaning that consumer and producer can happen in different contexts or
>>>> even threads and it shouldn't set p.napi.
>>>>
>>>
>>> As Jakub wrote this otx2 driver doesn't set p.napi (so that part of the
>>> problem cannot happen).
>>
>> Oh I'm dumb :z
>>
>>>
>>> That said using workqueues to refill the queues is not compatible with
>>> using page_pool_alloc APIs.  This need to be fixed in driver!
>>
>> Hmmm I'm wondering now, how should we use Page Pool if we want to run
>> consume and alloc routines separately? Do you want to say we can't use
>> Page Pool in that case at all? Quoting other your reply:
>>
>>> This WQ process is not allowed to use the page_pool_alloc() API this
>>> way (from a work-queue).  The PP alloc-side API must only be used
>>> under NAPI protection.
>>
>> Who did say that? If I don't set p.napi, how is Page Pool then tied to
>> NAPI?
> 
> *I* say that (as the PP inventor) as that was the design and intent,

> *I*
> the PP inventor

Sure I got that a couple years ago, no need to keep reminding me that xD

> that this is tied to a NAPI instance and rely on the NAPI protection to
> make it safe to do lockless access to this cache array.
> 
>> Moreover, when you initially do an ifup/.ndo_open, you have to fill your
>> Rx queues. It's process context and it can happen on whichever CPU.
>> Do you mean I can't allocate pages in .ndo_open? :D
> 
> True, that all driver basically allocate from this *before* the RX-ring
> / NAPI is activated.  That is safe and "allowed" given the driver
> RX-ring is not active yet.  This use-case unfortunately also make it
> harder to add something to the PP API, that detect miss-usage of the API.
> 
> Looking again at the driver otx2_txrx.c NAPI code path also calls PP
> directly in otx2_napi_handler() will call refill_pool_ptrs() ->
> otx2_refill_pool_ptrs() -> otx2_alloc_buffer() -> __otx2_alloc_rbuf() ->
> if (pool->page_pool) otx2_alloc_pool_buf() -> page_pool_alloc_frag().
> 
> The function otx2_alloc_buffer() can also choose to start a WQ, that
> also called PP alloc API this time via otx2_alloc_rbuf() that have
> BH-disable.  Like Sebastian, I don't think this is safe!

Disabling BH doesn't look correct to me, but I don't see issues in
having consumer and producer working on different cores, as long as they
use ptr_ring with locks.

> 
> --Jesper
> 
> This can be a workaround fix:
> 
> $ git diff
> diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> index dce3cea00032..ab7ca146fddf 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> @@ -578,6 +578,10 @@ int otx2_alloc_buffer(struct otx2_nic *pfvf, struct
> otx2_cq_queue *cq,
>                 struct refill_work *work;
>                 struct delayed_work *dwork;
> 
> +               /* page_pool alloc API cannot be used from WQ */
> +               if (cq->rbpool->page_pool)
> +                       return -ENOMEM;

I believe that breaks the driver?

> +
>                 work = &pfvf->refill_wrk[cq->cq_idx];
>                 dwork = &work->pool_refill_work;
>                 /* Schedule a task if no other task is running */

Thanks,
Olek