netdev - Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 18 Dec 2017 18:11:06 +0100
From:   Andreas Hartmann <andihartmann@...enet.de>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc:     Michal Kubecek <mkubecek@...e.cz>,
        Jason Wang <jasowang@...hat.com>,
        David Miller <davem@...emloft.net>,
        Network Development <netdev@...r.kernel.org>
Subject: Re: Linux 4.14 - regression: broken tun/tap / bridge network with
 virtio - bisected

On 12/17/2017 at 11:33 PM Willem de Bruijn wrote:
> On Fri, Dec 15, 2017 at 1:05 AM, Andreas Hartmann
> <andihartmann@...19freenet.de> wrote:
>> On 12/14/2017 at 11:17 PM Willem de Bruijn wrote:
>>>>> Well, the patch does not fix hanging VMs, which have been shutdown and
>>>>> can't be killed any more.
>>>>> Because of the stack trace
>>>>>
>>>>> [<ffffffffc0d0e3c5>] vhost_net_ubuf_put_and_wait+0x35/0x60 [vhost_net]
>>>>> [<ffffffffc0d0f264>] vhost_net_ioctl+0x304/0x870 [vhost_net]
>>>>> [<ffffffff9b25460f>] do_vfs_ioctl+0x8f/0x5c0
>>>>> [<ffffffff9b254bb4>] SyS_ioctl+0x74/0x80
>>>>> [<ffffffff9b00365b>] do_syscall_64+0x5b/0x100
>>>>> [<ffffffff9b78e7ab>] entry_SYSCALL64_slow_path+0x25/0x25
>>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>>>
>>>>> I was hoping, that the problems could be related - but that seems not to
>>>>> be true.
>>>>
>>>> However, it turned out, that reverting the complete patchset "Remove UDP
>>>> Fragmentation Offload support" prevent hanging qemu processes.
>>>
>>> That implies a combination of UFO and vhost zerocopy. Disabling
>>> experimental_zcopytx in vhost_net will probably work around the bug
>>> then.
> 
> I have been able to reproduce the hang by sending a UFO packet
> between two guests running v4.13 on a host running v4.15-rc1.
> 
> The vhost_net_ubuf_ref refcount indeed hits overflow (-1) from
> vhost_zerocopy_callback being called for each segment of a
> segmented UFO skb. This refcount is decremented then on each
> segment, but incremented only once for the entire UFO skb.
> 
> Before v4.14, these packets would be converted in skb_segment to
> regular copy packets with skb_orphan_frags and the callback function
> called once at this point. v4.14 added support for reference counted
> zerocopy skb that can pass through skb_orphan_frags unmodified and
> have their zerocopy state safely cloned with skb_zerocopy_clone.
> 
> The call to skb_zerocopy_clone must come after skb_orphan_frags
> to limit cloning of this state to those skbs that can do so safely.
> 
> Please try a host with the following patch. This fixes it for me. I intend to
> send it to net.
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index a592ca025fc4..d2d985418819 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> 
>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>                                               SKBTX_SHARED_FRAG;
> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
> -                       goto err;
> 
>                 while (pos < offset + len) {
>                         if (i >= nfrags) {
> @@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> 
>                         if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>                                 goto err;
> +                       if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
> +                               goto err;
> 
>                         *nskb_frag = *frag;
>                         __skb_frag_ref(nskb_frag);
> 
> 
> This is relatively inefficient, as it calls skb_zerocopy_clone for each frag
> in the frags[] array. I will follow-up with a patch to net-next that only
> checks once per skb:
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 466581cf4cdc..a293a33604ec 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3662,7 +3662,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> 
>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>                                               SKBTX_SHARED_FRAG;
> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
> +               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
> +                   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>                         goto err;
> 
>                 while (pos < offset + len) {
> @@ -3676,6 +3677,11 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> 
>                                 BUG_ON(!nfrags);
> 
> +                               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
> +                                   skb_zerocopy_clone(nskb, frag_skb,
> +                                                      GFP_ATOMIC))
> +                                       goto err;
> +
>                                 list_skb = list_skb->next;
>                         }
> 
> @@ -3687,9 +3693,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>                                 goto err;
>                         }
> 
> -                       if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
> -                               goto err;
> -

I'm currently testing this one.

> 
> I'll also send to net-next
> 
> (1) a patch to convert its vhost_net_ ubuf_ref refcnt to refcount_t
> 
> (2) a path to skb_zerocopy_clone to warn on clone if not
>      sock_zerocopy_callback
> 
>> I already tested it w/ options vhost_net experimental_zcopytx=0 - but
>> this didn't "resolve" anything. See
>> https://www.mail-archive.com/netdev@vger.kernel.org/msg203197.html
>>
>> Therefore, I think your following thoughts are lapsed unfortunately,
>> aren't they?
> 
> That experiment was perhaps run before commit 0c19f846d582 ("net:
> accept UFO datagrams from tuntap and packet") and hit the other UFO
> bug.

That's probably true.


Thanks,
Andreas