lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 21 Dec 2021 18:01:18 +0000
From:   Ignat Korchagin <ignat@...udflare.com>
To:     Paolo Abeni <pabeni@...hat.com>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        netdev <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        David Ahern <dsahern@...nel.org>,
        Jakub Kicinski <kuba@...nel.org>,
        kernel-team <kernel-team@...udflare.com>
Subject: Re: tcp: kernel BUG at net/core/skbuff.c:3574!

On Tue, Dec 21, 2021 at 5:31 PM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On Tue, 2021-12-21 at 17:16 +0000, Ignat Korchagin wrote:
> > On Tue, Dec 21, 2021 at 3:40 PM Paolo Abeni <pabeni@...hat.com> wrote:
> > >
> > > On Tue, 2021-12-21 at 06:16 -0800, Eric Dumazet wrote:
> > > > On Tue, Dec 21, 2021 at 4:19 AM Ignat Korchagin <ignat@...udflare.com> wrote:
> > > > >
> > > > > Hi netdev,
> > > > >
> > > > > While trying to reproduce a different rare bug we're seeing in
> > > > > production I've triggered below on 5.15.9 kernel and confirmed on the
> > > > > latest netdev master tree:
> > > > >
> > > >
> > > > Nothing comes to mind. skb_shift() has not been recently changed.
> > > >
> > > > Why are you disabling TSO exactly ?
> > > >
> > > > Is GRO being used on veth needed to trigger the bug ?
> > > > (GRO was added recently to veth, I confess I did not review the patches)
> >
> > Yes, it seems enabling GRO for veth actually enables NAPI codepaths,
> > which trigger this bug (and actually another one we're investigating).
> > Through trial-and-error it seems disabling TSO is more likely to
> > trigger it at least in my dev environment. I'm not sure if this bug is
> > somehow related to the other one we're investigating, but once we have
> > a fix here I can try to verify before posting it to the mailing list.
> >
> > > This is very likely my fault. I'm investigating it right now.
> >
> > Thank you very much! Let me know if I can help somehow.
>
> I'm testing the following patch. Could you please have a spin in your
> testbed, too?

Seems with the patch the BUG does not reproduce for me anymore.

Ignat

> Thanks!
>
> Paolo
> ---
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 38f6da24f460..b490448ca42c 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -711,6 +711,14 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
>         rcu_read_lock();
>         xdp_prog = rcu_dereference(rq->xdp_prog);
>         if (unlikely(!xdp_prog)) {
> +               if (unlikely(skb_shared(skb) || skb_head_is_locked(skb))) {
> +                       struct sk_buff *nskb = skb_copy(skb, GFP_ATOMIC | __GFP_NOWARN);
> +
> +                       if (!nskb)
> +                               goto drop;
> +                       consume_skb(skb);
> +                       skb = nskb;
> +               }
>                 rcu_read_unlock();
>                 goto out;
>         }
>
>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ