[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAC_iWjJmoqsC6w=9cjr5v9o+43=2t4LKeZCrEP83PBb7nRS6zw@mail.gmail.com>
Date: Wed, 23 Aug 2023 14:36:06 +0300
From: Ilias Apalodimas <ilias.apalodimas@...aro.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: netdev@...r.kernel.org, Ratheesh Kannoth <rkannoth@...vell.com>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Geetha sowjanya <gakula@...vell.com>, Jakub Kicinski <kuba@...nel.org>,
Jesper Dangaard Brouer <hawk@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Subbaraya Sundeep <sbhatta@...vell.com>, Sunil Goutham <sgoutham@...vell.com>,
Thomas Gleixner <tglx@...utronix.de>, hariprasad <hkelam@...vell.com>
Subject: Re: [BUG] Possible unsafe page_pool usage in octeontx2
Hi Sebastian,
Thanks for the report.
On Wed, 23 Aug 2023 at 12:48, Sebastian Andrzej Siewior
<bigeasy@...utronix.de> wrote:
>
> Hi,
>
> I've been looking at the page_pool locking.
Apologies for any traumas we caused with that code :)
>
> page_pool_alloc_frag() -> page_pool_alloc_pages() ->
> __page_pool_get_cached():
>
> There core of the allocation is:
> | /* Caller MUST guarantee safe non-concurrent access, e.g. softirq */
> | if (likely(pool->alloc.count)) {
> | /* Fast-path */
> | page = pool->alloc.cache[--pool->alloc.count];
>
> The access to the `cache' array and the `count' variable is not locked.
> This is fine as long as there only one consumer per pool. In my
> understanding the intention is to have one page_pool per NAPI callback
> to ensure this.
>
> The pool can be filled in the same context (within allocation if the
> pool is empty). There is also page_pool_recycle_in_cache() which fills
> the pool from within skb free, for instance:
> napi_consume_skb() -> skb_release_all() -> skb_release_data() ->
> napi_frag_unref() -> page_pool_return_skb_page().
>
> The last one has the following check here:
> | napi = READ_ONCE(pp->p.napi);
> | allow_direct = napi_safe && napi &&
> | READ_ONCE(napi->list_owner) == smp_processor_id();
>
> This eventually ends in page_pool_recycle_in_cache() where it adds the
> page to the cache buffer if the check above is true (and BH is disabled).
>
> napi->list_owner is set once NAPI is scheduled until the poll callback
> completed. It is safe to add items to list because only one of the two
> can run on a single CPU and the completion of them ensured by having BH
> disabled the whole time.
>
> This breaks in octeontx2 where a worker is used to fill the buffer:
> otx2_pool_refill_task() -> otx2_alloc_rbuf() -> __otx2_alloc_rbuf() ->
> otx2_alloc_pool_buf() -> page_pool_alloc_frag().
>
> BH is disabled but the add of a page can still happen while NAPI
> callback runs on a remote CPU and so corrupting the index/ array.
>
> API wise I would suggest to
>
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 7ff80b80a6f9f..b50e219470a36 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -612,7 +612,7 @@ __page_pool_put_page(struct page_pool *pool, struct page *page,
> page_pool_dma_sync_for_device(pool, page,
> dma_sync_size);
>
> - if (allow_direct && in_softirq() &&
> + if (allow_direct && in_serving_softirq() &&
> page_pool_recycle_in_cache(page, pool))
> return NULL;
>
FWIW we used to have that check.
commit 542bcea4be866b ("net: page_pool: use in_softirq() instead")
changed that, so maybe we should revert that overall?
> because the intention (as I understand it) is to be invoked from within
> the NAPI callback (while softirq is served) and not if BH is just
> disabled due to a lock or so.
>
> It would also make sense to a add WARN_ON_ONCE(!in_serving_softirq()) to
> page_pool_alloc_pages() to spot usage outside of softirq. But this will
> trigger in every driver since the same function is used in the open
> callback to initially setup the HW.
What about adding a check in the cached allocation path in order to
skip the initial page allocation?
Thanks
/Ilias
>
> Sebastian
Powered by blists - more mailing lists