[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALrw=nEAEqq71Bwn0tJvFum3a1Ht6ynGedjH7uFpfFgSOU1AHg@mail.gmail.com>
Date: Tue, 21 Dec 2021 18:01:18 +0000
From: Ignat Korchagin <ignat@...udflare.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: Eric Dumazet <edumazet@...gle.com>,
netdev <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
David Ahern <dsahern@...nel.org>,
Jakub Kicinski <kuba@...nel.org>,
kernel-team <kernel-team@...udflare.com>
Subject: Re: tcp: kernel BUG at net/core/skbuff.c:3574!
On Tue, Dec 21, 2021 at 5:31 PM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On Tue, 2021-12-21 at 17:16 +0000, Ignat Korchagin wrote:
> > On Tue, Dec 21, 2021 at 3:40 PM Paolo Abeni <pabeni@...hat.com> wrote:
> > >
> > > On Tue, 2021-12-21 at 06:16 -0800, Eric Dumazet wrote:
> > > > On Tue, Dec 21, 2021 at 4:19 AM Ignat Korchagin <ignat@...udflare.com> wrote:
> > > > >
> > > > > Hi netdev,
> > > > >
> > > > > While trying to reproduce a different rare bug we're seeing in
> > > > > production I've triggered below on 5.15.9 kernel and confirmed on the
> > > > > latest netdev master tree:
> > > > >
> > > >
> > > > Nothing comes to mind. skb_shift() has not been recently changed.
> > > >
> > > > Why are you disabling TSO exactly ?
> > > >
> > > > Is GRO being used on veth needed to trigger the bug ?
> > > > (GRO was added recently to veth, I confess I did not review the patches)
> >
> > Yes, it seems enabling GRO for veth actually enables NAPI codepaths,
> > which trigger this bug (and actually another one we're investigating).
> > Through trial-and-error it seems disabling TSO is more likely to
> > trigger it at least in my dev environment. I'm not sure if this bug is
> > somehow related to the other one we're investigating, but once we have
> > a fix here I can try to verify before posting it to the mailing list.
> >
> > > This is very likely my fault. I'm investigating it right now.
> >
> > Thank you very much! Let me know if I can help somehow.
>
> I'm testing the following patch. Could you please have a spin in your
> testbed, too?
Seems with the patch the BUG does not reproduce for me anymore.
Ignat
> Thanks!
>
> Paolo
> ---
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 38f6da24f460..b490448ca42c 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -711,6 +711,14 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
> rcu_read_lock();
> xdp_prog = rcu_dereference(rq->xdp_prog);
> if (unlikely(!xdp_prog)) {
> + if (unlikely(skb_shared(skb) || skb_head_is_locked(skb))) {
> + struct sk_buff *nskb = skb_copy(skb, GFP_ATOMIC | __GFP_NOWARN);
> +
> + if (!nskb)
> + goto drop;
> + consume_skb(skb);
> + skb = nskb;
> + }
> rcu_read_unlock();
> goto out;
> }
>
>
>
Powered by blists - more mailing lists