linux-kernel - KVM hangs with many soft lockups purge_fragmented_blocks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <7aba73b08b4eef98e9bebff83499b25f1c332da5.camel@gmail.com>
Date:   Fri, 12 Jun 2020 11:10:17 +0200
From:   Igor Raits <igor.raits@...il.com>
To:     linux-kernel@...r.kernel.org
Subject: KVM hangs with many soft lockups purge_fragmented_blocks_allcpus

Hello,
We're running CentOS 7 with 5.4.x kernel (5.4.34) and VMs on them
(qemu-kvm) that also have same OS and kernel.

We've seen already few times that VM gets stuck with bunch of kernel
messages about soft lockups and such.

I have been trying to google if anything of these traces were somehow
already fixed since v5.4 tag, but found only some fixes related to
address sanitizers, but those did not seem to be candidates for fixing
this issue.

Unfortunately we don't have a reproducer for this, but it did not seem
to happen on 4.19.x kernels. Any help is much appreciated!


[3142674.625842][T283860] BUG: unable to handle page fault for address:
ffff893ebf410362
...
[3142757.454613][    C4] CPU: 4 PID: 283852 Comm: vertica Tainted: G  
EL    5.4.34-1.el7.gdc.x86_64 #1
[3142757.455863][    C5] CPU: 5 PID: 284163 Comm: vertica Tainted: G  
EL    5.4.34-1.el7.gdc.x86_64 #1
[3142757.456482][    C7] watchdog: BUG: soft lockup - CPU#7 stuck for
22s! [sshd:284646]
...
[3142757.456505][    C7] CPU: 7 PID: 284646 Comm: sshd Tainted: G     
EL    5.4.34-1.el7.gdc.x86_64 #1
...
[3142757.456511][    C7] RIP: 0010:smp_call_function_many+0x21e/0x280
...
[3142757.456521][    C7] Call Trace:
[3142757.456526][    C7]  ? do_kernel_range_flush+0x50/0x50
[3142757.456527][    C7]  on_each_cpu+0x28/0x50
[3142757.456530][    C7]  __purge_vmap_area_lazy+0x86/0x680
[3142757.456531][    C7]  ? purge_fragmented_blocks_allcpus+0x42/0x1f0
[3142757.456533][    C7]  _vm_unmap_aliases+0xf2/0x130
[3142757.456535][    C7]  change_page_attr_set_clr+0xb6/0x1e0
[3142757.456537][    C7]  set_memory_ro+0x2d/0x40
[3142757.456539][    C7]  bpf_int_jit_compile+0x2ee/0x3e0
[3142757.456543][    C7]  bpf_prog_select_runtime+0xdf/0x140
[3142757.456546][    C7]  bpf_prepare_filter+0x597/0x610
[3142757.456547][    C7]  bpf_prog_create_from_user+0xd5/0x120
[3142757.456550][    C7]  ? hardlockup_detector_perf_cleanup+0xa0/0xa0
[3142757.456551][    C7]  do_seccomp+0x294/0x8c0
[3142757.456554][    C7]  __x64_sys_prctl+0x280/0x58d
[3142757.456557][    C7]  do_syscall_64+0x5b/0x1b0
[3142757.456560][    C7]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
...
[3142757.459517][    C8] Call Trace:
[3142757.459521][    C8]  ? flush_tlb_func_common.isra.8+0x230/0x230
[3142757.459522][    C8]  ? flush_tlb_func_common.isra.8+0x230/0x230
[3142757.459523][    C8]  on_each_cpu_mask+0x24/0x60
[3142757.459524][    C8]  ? x86_configure_nx+0x50/0x50
[3142757.459525][    C8]  on_each_cpu_cond_mask+0xa5/0x140
[3142757.459526][    C8]  ? flush_tlb_func_common.isra.8+0x147/0x230
[3142757.459527][    C8]  flush_tlb_mm_range+0xab/0xe0
[3142757.459530][    C8]  ptep_clear_flush+0x56/0x60
[3142757.459532][    C8]  wp_page_copy+0x34a/0x7e0
[3142757.459534][    C8]  do_wp_page+0x8b/0x660
[3142757.459536][    C8]  __handle_mm_fault+0x777/0xef0
[3142757.459538][    C8]  handle_mm_fault+0xe2/0x1f0
[3142757.459541][    C8]  __do_page_fault+0x226/0x490
[3142757.459543][    C8]  do_page_fault+0x31/0x120
[3142757.459545][    C8]  async_page_fault+0x3e/0x50
...

The full dmesg is attached as a file. After the last line, VM froze and
we were not able to get the crashdump of the VM due to either some
libvirt bug or similar.
-- 
Igor Raits <igor.raits@...il.com>

View attachment "dmesg.txt" of type "text/plain" (170575 bytes)

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)