[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <156f3e120bd0757133cb6bc11b76889637b5e0a6.camel@gmail.com>
Date: Thu, 26 Jan 2023 10:38:45 -0800
From: Alexander H Duyck <alexander.duyck@...il.com>
To: Felix Fietkau <nbd@....name>
Cc: Ilias Apalodimas <ilias.apalodimas@...aro.org>,
netdev@...r.kernel.org, Jesper Dangaard Brouer <hawk@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Lorenzo Bianconi <lorenzo@...nel.org>,
linux-kernel@...r.kernel.org, Yunsheng Lin <linyunsheng@...wei.com>
Subject: Re: [PATCH] net: page_pool: fix refcounting issues with fragmented
allocation
>
> > Which piece did you change? My main interest would be trying to narrow
> > down the section of this function that is causing this. Did you modify
> > __ieee80211_amsdu_copy or some other function within the setup?
> I replaced this line:
> bool reuse_frag = skb->head_frag && !skb_has_frag_list(skb);
> with:
> bool reuse_frag = false;
I see. So essentially everything is copied into the new buffers.
> > > > I believe the issue must be specific to that codepath, since most
> > > > received and processed packets are either not A-MSDU or A-MSDU decap has
> > > > already been performed by the hardware.
> > > > If I change my test to use 3 client mode interfaces instead of 4, the
> > > > hardware is able to offload all A-MSDU rx processing and I don't see any
> > > > crashes anymore.
> > > >
> > > > Could you please take another look at ieee80211_amsdu_to_8023s to see if
> > > > there's anything in there that could cause these issues?
> >
> > The thing is I am not sure it is the only cause for this. I am
> > suspecting we are seeing this triggering an issue when combined with
> > something else.
> >
> > If we could add some tracing to dump the skb and list buffers that
> > might be helpful. We would want to verify the pp_recycle value, clone
> > flag, and for the frags we would want to frag count and page reference
> > counts. The expectation would be that the original skb should have the
> > pp_recycle flag set and the frag counts consistent through the
> > process, and the list skbs should all have taken page references w/ no
> > pp_recycle on the skbs in the list.
> >
> > > Here are clues from a few more tests that I ran:
> > > - preventing the reuse of the last skb in ieee80211_amsdu_to_8023s does
> > > not prevent the crashes, so the issue is indeed related to taking page
> > > references and putting the pages in skb fragments.
> >
> > You said in the first email it stops it and in the second "does not".
> > I am assuming that is some sort of typo since you seem to be implying
> > it does resolve it for you. Is that correct?
> For everything except for the last subframe, the function pulls
> fragments out of the original skb and puts them into a new skb allocated
> with dev_alloc_skb. Additionally, the last skb is reused for the final
> subframe. What I meant was: In order to figure out if the skb-reuse is
> problematic, I prevented it from happening, ensuring that all subframes
> are allocated and get the fragments from the skb.
> All I meant to say is that since that led to the same crash, the
> skb-reuse is not the problem.
>
> Regarding the question from your other mail: without GRO there is no
> crash and it looks stable.
>
> - Felix
>
Okay, I think that tells me exactly what is going on. Can you give the
change below a try and see if it solves the problem for you.
I think what is happening is that after you are reassigning the frags
they are getting merged into GRO frames where the head may have
pp_recycle set. As a result I think the pages are getting recycled when
they should be just freed via put_page.
I'm suspecting this wasn't an issue up until now as I don't believe
there are any that are running in a mixed mode where they have both
pp_recycle and non-pp_recycle skbs coming from the same device.
diff --git a/net/core/gro.c b/net/core/gro.c
index 506f83d715f8..4bac7ea6e025 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -162,6 +162,15 @@ int skb_gro_receive(struct sk_buff *p, struct
sk_buff *skb)
struct sk_buff *lp;
int segs;
+ /* Do not splice page pool based packets w/ non-page pool
+ * packets. This can result in reference count issues as page
+ * pool pages will not decrement the reference count and will
+ * instead be immediately returned to the pool or have frag
+ * count decremented.
+ */
+ if (p->pp_recycle != skb->pp_recycle)
+ return -ETOOMANYREFS;
+
/* pairs with WRITE_ONCE() in netif_set_gro_max_size() */
gro_max_size = READ_ONCE(p->dev->gro_max_size);
Powered by blists - more mailing lists