linux-kernel - Re: Shutting down a VM with Kernel 4.14 will sometime hang and a reboot is the only way to recover.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <1625c485-ef1d-1fd8-0d4e-99fd6bd631ea@hakimo.net>
Date:   Sat, 2 Dec 2017 13:16:06 +0100
From:   Harald Moeller <h.moeller@...imo.net>
To:     linux-kernel@...r.kernel.org
Subject: Re: Shutting down a VM with Kernel 4.14 will sometime hang and a
 reboot is the only way to recover.

Hello, my name is Harry and this is my first post here, hope I'm doing 
this the right way, sorry if not ...

I'm not a subscriber to the full list yet so I understand I shall ask 
you to be personally CCed.

I am following this as I do experience the same (or sort-a same) issue 
with 4.14.2.

My setup is more simple, just an oVirt host shutting down some VMs. 
Doesn't happen all the time but I'd say around 3 from 10.

This is what I see (slightly different from David):

Dec 01 16:11:53 oVirtHost01.xyz.net kernel: INFO: task qemu-kvm:1173 
blocked for more than 120 seconds.
Dec 01 16:11:53 oVirtHost01.xyz.net kernel:       Tainted: G          
I     4.14.2-1.el7.hakimo.x86_64 #4
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: qemu-kvm        D 0  
1173      1 0x00000084
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: Call Trace:
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: __schedule+0x28d/0x880
Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  schedule+0x36/0x80
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: 
vhost_net_ubuf_put_and_wait+0x61/0x90 [vhost_net]
Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  ? remove_wait_queue+0x60/0x60
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: vhost_net_ioctl+0x317/0x8e0 
[vhost_net]
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: do_vfs_ioctl+0xa7/0x5f0
Dec 01 16:11:53 oVirtHost01.xyz.net kernel:  SyS_ioctl+0x79/0x90
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: do_syscall_64+0x67/0x1b0
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: 
entry_SYSCALL64_slow_path+0x25/0x25
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RIP: 0033:0x7fb8862d1107
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RSP: 002b:00007fff4acd7e58 
EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RAX: ffffffffffffffda RBX: 
000055abaa2d29c0 RCX: 00007fb8862d1107
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RDX: 00007fff4acd7e60 RSI: 
000000004008af30 RDI: 0000000000000028
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: RBP: 00007fff4acd7e60 R08: 
000055aba805e10f R09: 00000000ffffffff
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: R10: 0000000000000004 R11: 
0000000000000246 R12: 000055ababf32510
Dec 01 16:11:53 oVirtHost01.xyz.net kernel: R13: 0000000000000001 R14: 
000055ababf32498 R15: 000055abaa2a0b40

This is still happening after reverting the three suggested commits

1f8b977ab32dc5d148f103326e80d9097f1cefb5 ("sock: enable MSG_ZEROCOPY")

c1d1b437816f0afa99202be3cb650c9d174667bc ("net: convert (struct 
ubuf_info)->refcnt to refcount_t")

581fe0ea61584d88072527ae9fb9dcb9d1f2783e {"net: orphan frags on 
stand-alone ptype in dev_queue_xmit_nit"}

Anything I could be helpful with trying to solve this? Any more info I 
could provide?

Harry