[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YkRP7XwvdgFbvGsk@myrica>
Date: Wed, 30 Mar 2022 13:41:17 +0100
From: Jean-Philippe Brucker <jean-philippe@...aro.org>
To: Alexander H Duyck <alexander.duyck@...il.com>
Cc: ilias.apalodimas@...aro.org, hawk@...nel.org,
alexanderduyck@...com, linyunsheng@...wei.com, davem@...emloft.net,
kuba@...nel.org, pabeni@...hat.com, netdev@...r.kernel.org
Subject: Re: [PATCH net v2] skbuff: disable coalescing for page_pool fragment
recycling
On Mon, Mar 28, 2022 at 08:03:46AM -0700, Alexander H Duyck wrote:
> > (3b) Now while handling TCP, coalesce SKB3 with SKB1:
> >
> > tcp_v4_rcv(SKB3)
> > tcp_try_coalesce(to=SKB1, from=SKB3) // succeeds
> > kfree_skb_partial(SKB3)
> > skb_release_data(SKB3) // drops one dataref
> >
> > SKB1 _____ PAGE1
> > \____
> > SKB2 _____ PAGE2
> > /
> > RX_BD3 _________/
> >
> > In tcp_try_coalesce(), __skb_frag_ref() takes a page reference to
> > PAGE2, where it should instead have increased the page_pool frag
> > reference, pp_frag_count. Without coalescing, when releasing both
> > SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now
> > when releasing SKB1 and SKB2, two references to PAGE2 will be
> > dropped, resulting in underflow.
> >
> > (3c) Drop SKB2:
> >
> > af_packet_rcv(SKB2)
> > consume_skb(SKB2)
> > skb_release_data(SKB2) // drops second dataref
> > page_pool_return_skb_page(PAGE2) // drops one pp_frag_count
> >
> > SKB1 _____ PAGE1
> > \____
> > PAGE2
> > /
> > RX_BD3 _________/
> >
> > (4) Userspace calls recvmsg()
> > Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we
> > release the SKB3 page as well:
> >
> > tcp_eat_recv_skb(SKB1)
> > skb_release_data(SKB1)
> > page_pool_return_skb_page(PAGE1)
> > page_pool_return_skb_page(PAGE2) // drops second pp_frag_count
> >
> > (5) PAGE2 is freed, but the third RX descriptor was still using it!
> > In our case this causes IOMMU faults, but it would silently corrupt
> > memory if the IOMMU was disabled.
> >
> > Prevent coalescing the SKB if it may hold shared page_pool fragment
> > references.
> >
> > Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool")
> > Signed-off-by: Jean-Philippe Brucker <jean-philippe@...aro.org>
> > ---
> > net/core/skbuff.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 10bde7c6db44..56b45b9f0b4d 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -5276,6 +5276,13 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
> > if (skb_cloned(to))
> > return false;
> >
> > + /* We don't support taking page_pool frag references at the moment.
> > + * If the SKB is cloned and could have page_pool frag references, don't
> > + * coalesce it.
> > + */
> > + if (skb_cloned(from) && from->pp_recycle)
> > + return false;
> > +
> > /* The page pool signature of struct page will eventually figure out
> > * which pages can be recycled or not but for now let's prohibit slab
> > * allocated and page_pool allocated SKBs from being coalesced.
>
>
> This is close but not quite. Actually now that I think about it we can
> probably alter the block below rather than adding a new one.
>
> The issue is we want only reference counted pages in standard skbs, and
> pp_frag_count pages in pp_recycle skbs. So we already had logic along
> the lines of:
> if (to->pp_recycle != from->pp_recycle)
> return false;
>
> I would say we need to change that because from->pp_recycle is the
> piece that is probably too simplistic. Basically we will get a page
> pool page if from->pp_recycle && !skb_cloned(from). So we can probably
> just tweak the check below to be something along the lines of:
> if (to->pp_recycle != (from->pp_recycle && !skb_cloned(from)))
> return false;
Just to confirm this is fine: the behavior now changes for
to->pp_recycle == 0, from->pp_recycle == 1 and skb_cloned(from) == 1
In this case we now coalesce and take a page ref. So the page has two refs
and two pp_frag_count. (3c) drops one pp_frag_count. If there wasn't
another RX desc holding a pp_frag_count (ie. no step (5)), that would also
drop a page ref, but since 'to' SKB is holding a second page ref the page
is not recycled. That reference gets dropped at (4) and the page is freed
there. With step (5), the page would get recycled into page_pool, but
without (5) the page is discarded.
I guess it works, just want to make sure that it's OK to mix page_pool
pp_frag_count and normal reference counting at the same time.
Thanks,
Jean
>
> That should actually increase the number of cases where we can
> correctly coalesce frames while also addressing the issue you
> discovered as it will only merge cloned skbs into standard skbs instead
> of page pool ones.
>
Powered by blists - more mailing lists