[<prev] [next>] [day] [month] [year] [list]
Message-ID: <7aba73b08b4eef98e9bebff83499b25f1c332da5.camel@gmail.com>
Date: Fri, 12 Jun 2020 11:10:17 +0200
From: Igor Raits <igor.raits@...il.com>
To: linux-kernel@...r.kernel.org
Subject: KVM hangs with many soft lockups purge_fragmented_blocks_allcpus
Hello,
We're running CentOS 7 with 5.4.x kernel (5.4.34) and VMs on them
(qemu-kvm) that also have same OS and kernel.
We've seen already few times that VM gets stuck with bunch of kernel
messages about soft lockups and such.
I have been trying to google if anything of these traces were somehow
already fixed since v5.4 tag, but found only some fixes related to
address sanitizers, but those did not seem to be candidates for fixing
this issue.
Unfortunately we don't have a reproducer for this, but it did not seem
to happen on 4.19.x kernels. Any help is much appreciated!
[3142674.625842][T283860] BUG: unable to handle page fault for address:
ffff893ebf410362
...
[3142757.454613][ C4] CPU: 4 PID: 283852 Comm: vertica Tainted: G
EL 5.4.34-1.el7.gdc.x86_64 #1
[3142757.455863][ C5] CPU: 5 PID: 284163 Comm: vertica Tainted: G
EL 5.4.34-1.el7.gdc.x86_64 #1
[3142757.456482][ C7] watchdog: BUG: soft lockup - CPU#7 stuck for
22s! [sshd:284646]
...
[3142757.456505][ C7] CPU: 7 PID: 284646 Comm: sshd Tainted: G
EL 5.4.34-1.el7.gdc.x86_64 #1
...
[3142757.456511][ C7] RIP: 0010:smp_call_function_many+0x21e/0x280
...
[3142757.456521][ C7] Call Trace:
[3142757.456526][ C7] ? do_kernel_range_flush+0x50/0x50
[3142757.456527][ C7] on_each_cpu+0x28/0x50
[3142757.456530][ C7] __purge_vmap_area_lazy+0x86/0x680
[3142757.456531][ C7] ? purge_fragmented_blocks_allcpus+0x42/0x1f0
[3142757.456533][ C7] _vm_unmap_aliases+0xf2/0x130
[3142757.456535][ C7] change_page_attr_set_clr+0xb6/0x1e0
[3142757.456537][ C7] set_memory_ro+0x2d/0x40
[3142757.456539][ C7] bpf_int_jit_compile+0x2ee/0x3e0
[3142757.456543][ C7] bpf_prog_select_runtime+0xdf/0x140
[3142757.456546][ C7] bpf_prepare_filter+0x597/0x610
[3142757.456547][ C7] bpf_prog_create_from_user+0xd5/0x120
[3142757.456550][ C7] ? hardlockup_detector_perf_cleanup+0xa0/0xa0
[3142757.456551][ C7] do_seccomp+0x294/0x8c0
[3142757.456554][ C7] __x64_sys_prctl+0x280/0x58d
[3142757.456557][ C7] do_syscall_64+0x5b/0x1b0
[3142757.456560][ C7] entry_SYSCALL_64_after_hwframe+0x44/0xa9
...
[3142757.459517][ C8] Call Trace:
[3142757.459521][ C8] ? flush_tlb_func_common.isra.8+0x230/0x230
[3142757.459522][ C8] ? flush_tlb_func_common.isra.8+0x230/0x230
[3142757.459523][ C8] on_each_cpu_mask+0x24/0x60
[3142757.459524][ C8] ? x86_configure_nx+0x50/0x50
[3142757.459525][ C8] on_each_cpu_cond_mask+0xa5/0x140
[3142757.459526][ C8] ? flush_tlb_func_common.isra.8+0x147/0x230
[3142757.459527][ C8] flush_tlb_mm_range+0xab/0xe0
[3142757.459530][ C8] ptep_clear_flush+0x56/0x60
[3142757.459532][ C8] wp_page_copy+0x34a/0x7e0
[3142757.459534][ C8] do_wp_page+0x8b/0x660
[3142757.459536][ C8] __handle_mm_fault+0x777/0xef0
[3142757.459538][ C8] handle_mm_fault+0xe2/0x1f0
[3142757.459541][ C8] __do_page_fault+0x226/0x490
[3142757.459543][ C8] do_page_fault+0x31/0x120
[3142757.459545][ C8] async_page_fault+0x3e/0x50
...
The full dmesg is attached as a file. After the last line, VM froze and
we were not able to get the crashdump of the VM due to either some
libvirt bug or similar.
--
Igor Raits <igor.raits@...il.com>
View attachment "dmesg.txt" of type "text/plain" (170575 bytes)
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists