netdev - Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <cd0ea113-34ff-d24a-b798-819dfb536c76@redhat.com>
Date:   Tue, 19 Dec 2017 11:36:19 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     David Hill <dhill@...hat.com>, Paolo Bonzini <pbonzini@...hat.com>,
        kvm@...r.kernel.org
Cc:     Willem de Bruijn <willemb@...gle.com>,
        netdev <netdev@...r.kernel.org>
Subject: Re: Shutting down a VM with Kernel 4.14 will sometime hang and a
 reboot is the only way to recover.



On 2017年12月12日 11:53, David Hill wrote:
>
>
> On 2017-12-08 01:03 PM, David Hill wrote:
>>
>>
>> On 2017-12-07 12:13 AM, Jason Wang wrote:
>>>
>>>
>>> On 2017年12月07日 12:42, David Hill wrote:
>>>>
>>>>
>>>> On 2017-12-06 11:34 PM, David Hill wrote:
>>>>>
>>>>>
>>>>> On 2017-12-04 02:51 PM, David Hill wrote:
>>>>>>
>>>>>> On 2017-12-03 11:08 PM, Jason Wang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2017年12月02日 00:38, David Hill wrote:
>>>>>>>>>
>>>>>>>>> Finally, I reverted 581fe0ea61584d88072527ae9fb9dcb9d1f2783e 
>>>>>>>>> too ... compiling and I'll keep you posted.
>>>>>>>>
>>>>>>>> So I'm still able to reproduce this issue even with reverting 
>>>>>>>> these 3 commits.  Would you have other suspect commits ? 
>>>>>>>
>>>>>>> Thanks for the testing. No, I don't have other suspect commits.
>>>>>>>
>>>>>>> Looks like somebody else it hitting your issue too (see 
>>>>>>> https://www.spinics.net/lists/netdev/msg468319.html)
>>>>>>>
>>>>>>> But he claims the issue were fixed by using qemu 2.10.1.
>>>>>>>
>>>>>>> So you may:
>>>>>>>
>>>>>>> -try to see if qemu 2.10.1 solves your issue
>>>>>> It didn't solve it for him... it's only harder to reproduce. [1]
>>>>>>> -if not, try to see if commit 
>>>>>>> 2ddf71e23cc246e95af72a6deed67b4a50a7b81c ("net: add notifier 
>>>>>>> hooks for devmap bpf map") is the first bad commit
>>>>>> I'll try to see what I can do here
>>>>> I'm looking at that commit and it's been introduced before v4.13 
>>>>> if I'm not mistaken while this issue appeared between v4.13 and 
>>>>> v4.14-rc1 .  Between those two releases, there're 1352 commits.
>>>>> Is there a way to quickly know which commits are touching 
>>>>> vhost-net, zerocopy ?
>>>>>
>>>>>
>>>>> [ 7496.553044]  __schedule+0x2dc/0xbb0
>>>>> [ 7496.553055]  ? trace_hardirqs_on+0xd/0x10
>>>>> [ 7496.553074]  schedule+0x3d/0x90
>>>>> [ 7496.553087]  vhost_net_ubuf_put_and_wait+0x73/0xa0 [vhost_net]
>>>>> [ 7496.553100]  ? finish_wait+0x90/0x90
>>>>> [ 7496.553115]  vhost_net_ioctl+0x542/0x910 [vhost_net]
>>>>> [ 7496.553144]  do_vfs_ioctl+0xa6/0x6c0
>>>>> [ 7496.553166]  SyS_ioctl+0x79/0x90
>>>>> [ 7496.553182]  entry_SYSCALL_64_fastpath+0x1f/0xbe
>>>>
>>>> That vhost_net_ubuf_put_and)wait call has been changed in this 
>>>> commit with the following comment:
>>>>
>>>> commit 0ad8b480d6ee916aa84324f69acf690142aecd0e
>>>> Author: Michael S. Tsirkin <mst@...hat.com>
>>>> Date:   Thu Feb 13 11:42:05 2014 +0200
>>>>
>>>>     vhost: fix ref cnt checking deadlock
>>>>
>>>>     vhost checked the counter within the refcnt before 
>>>> decrementing.  It
>>>>     really wanted to know that it is the one that has the last 
>>>> reference, as
>>>>     a way to batch freeing resources a bit more efficiently.
>>>>
>>>>     Note: we only let refcount go to 0 on device release.
>>>>
>>>>     This works well but we now access the ref counter twice so 
>>>> there's a
>>>>     race: all users might see a high count and decide to defer freeing
>>>>     resources.
>>>>     In the end no one initiates freeing resources until the last 
>>>> reference
>>>>     is gone (which is on VM shotdown so might happen after a 
>>>> looooong time).
>>>>
>>>>     Let's do what we probably should have done straight away:
>>>>     switch from kref to plain atomic, documenting the
>>>>     semantics, return the refcount value atomically after decrement,
>>>>     then use that to avoid the deadlock.
>>>>
>>>>     Reported-by: Qin Chuanyu <qinchuanyu@...wei.com>
>>>>     Signed-off-by: Michael S. Tsirkin <mst@...hat.com>
>>>>     Acked-by: Jason Wang <jasowang@...hat.com>
>>>>     Signed-off-by: David S. Miller <davem@...emloft.net>
>>>>
>>>>
>>>>
>>>> So at this point, are we hitting a deadlock when using 
>>>> experimental_zcopytx ? 
>>>
>>> Yes. But there could be another possibility that it was not caused 
>>> by vhost_net itself but other places that holds a packet.
>>>
>>> Thanks
>>
>> While bisecting, when I reach this commit 
>> 46d4b68f891bee5d83a32508bfbd9778be6b1b63, the system kernel panic 
>> when I run virt-customize :
>>
>> Message from syslogd@...pa at Dec  8 12:52:06 ...
>>  kernel:[  350.016376] Kernel panic - not syncing: Fatal exception in 
>> interrupt
>>
>> I marked that commit as bad again.   Will continue bisecting!
>>
>
> It looks like the first bad commit would be the following:
>
> [jenkins@...pa linux-stable-new]$ sudo bash bisect.sh -g
> 3ece782693c4b64d588dd217868558ab9a19bfe7 is the first bad commit
> commit 3ece782693c4b64d588dd217868558ab9a19bfe7
> Author: Willem de Bruijn <willemb@...gle.com>
> Date:   Thu Aug 3 16:29:38 2017 -0400
>
>     sock: skb_copy_ubufs support for compound pages
>
>     Refine skb_copy_ubufs to support compound pages. With upcoming TCP
>     zerocopy sendmsg, such fragments may appear.
>
>     The existing code replaces each page one for one. Splitting each
>     compound page into an independent number of regular pages can result
>     in exceeding limit MAX_SKB_FRAGS if data is not exactly page aligned.
>
>     Instead, fill all destination pages but the last to PAGE_SIZE.
>     Split the existing alloc + copy loop into separate stages:
>     1. compute bytelength and minimum number of pages to store this.
>     2. allocate
>     3. copy, filling each page except the last to PAGE_SIZE bytes
>     4. update skb frag array
>
>     Signed-off-by: Willem de Bruijn <willemb@...gle.com>
>     Signed-off-by: David S. Miller <davem@...emloft.net>
>
> :040000 040000 f1b652be7e59b1046400cad8e6be25028a88b8e2 
> 6ecf86d9f06a2d98946f531f1e4cf803de071b10 M    include
> :040000 040000 8420cf451fcf51f669ce81437ce7e0aacc33d2eb 
> 4fc8384362693e4619fab39b0a945f6f2349226b M    net
>
> Here is the bisect log:

Thanks for the hard bisecting.

Cc netdev and Willem.


>
> [root@...pa linux-stable-new]# git bisect log
> git bisect start
> # bad: [2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e] Linux 4.14-rc1
> git bisect bad 2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e
> # good: [e87c13993f16549e77abce9744af844c55154349] Linux 4.13.16
> git bisect good e87c13993f16549e77abce9744af844c55154349
> # good: [569dbb88e80deb68974ef6fdd6a13edb9d686261] Linux 4.13
> git bisect good 569dbb88e80deb68974ef6fdd6a13edb9d686261
> # good: [569dbb88e80deb68974ef6fdd6a13edb9d686261] Linux 4.13
> git bisect good 569dbb88e80deb68974ef6fdd6a13edb9d686261
> # bad: [aae3dbb4776e7916b6cd442d00159bea27a695c1] Merge 
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
> git bisect bad aae3dbb4776e7916b6cd442d00159bea27a695c1
> # good: [bf1d6b2c76eda86159519bf5c427b1fa8f51f733] Merge tag 
> 'staging-4.14-rc1' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
> git bisect good bf1d6b2c76eda86159519bf5c427b1fa8f51f733
> # bad: [e833251ad813168253fef9915aaf6a8c883337b0] rxrpc: Add 
> notification of end-of-Tx phase
> git bisect bad e833251ad813168253fef9915aaf6a8c883337b0
> # bad: [46d4b68f891bee5d83a32508bfbd9778be6b1b63] Merge tag 
> 'wireless-drivers-next-for-davem-2017-08-07' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
> git bisect bad 46d4b68f891bee5d83a32508bfbd9778be6b1b63
> # good: [cf6c6ea352faadb15d1373d890bf857080b218a4] iwlwifi: mvm: fix 
> the FIFO numbers in A000 devices
> git bisect good cf6c6ea352faadb15d1373d890bf857080b218a4
> # good: [65205cc465e9b37abbdbb3d595c46081b97e35bc] sctp: remove the 
> typedef sctp_addiphdr_t
> git bisect good 65205cc465e9b37abbdbb3d595c46081b97e35bc
> # bad: [ecbd87b8430419199cc9dd91598d5552a180f558] phylink: add support 
> for MII ioctl access to Clause 45 PHYs
> git bisect bad ecbd87b8430419199cc9dd91598d5552a180f558
> # bad: [52267790ef52d7513879238ca9fac22c1733e0e3] sock: add MSG_ZEROCOPY
> git bisect bad 52267790ef52d7513879238ca9fac22c1733e0e3
> # good: [04b1d4e50e82536c12da00ee04a77510c459c844] net: core: Make the 
> FIB notification chain generic
> git bisect good 04b1d4e50e82536c12da00ee04a77510c459c844
> # good: [9217d8c2fe743f02a3ce6d430fe3b5d514fd5f1c] ipv6: Regenerate 
> host route according to node pointer upon loopback up
> git bisect good 9217d8c2fe743f02a3ce6d430fe3b5d514fd5f1c
> # good: [0a7fd1ac2a6b316ceeb9a57a41ce0c45f6bff549] mlxsw: 
> spectrum_router: Add support for route replace
> git bisect good 0a7fd1ac2a6b316ceeb9a57a41ce0c45f6bff549
> # good: [84b7187ca2338832e3af58eb5123c02bb6921e4e] Merge branch 
> 'mlxsw-Support-for-IPv6-UC-router'
> git bisect good 84b7187ca2338832e3af58eb5123c02bb6921e4e
> # bad: [3ece782693c4b64d588dd217868558ab9a19bfe7] sock: skb_copy_ubufs 
> support for compound pages
> git bisect bad 3ece782693c4b64d588dd217868558ab9a19bfe7
> # good: [98ba0bd5505dcbb90322a4be07bcfe6b8a18c73f] sock: allocate skbs 
> from optmem
> git bisect good 98ba0bd5505dcbb90322a4be07bcfe6b8a18c73f
> # first bad commit: [3ece782693c4b64d588dd217868558ab9a19bfe7] sock: 
> skb_copy_ubufs support for compound pages
>
>