linux-kernel - [PATCH 0/2] Fix the Xen HVM kdump/kexec boot panic issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20211012072428.2569-1-dongli.zhang@oracle.com>
Date:   Tue, 12 Oct 2021 00:24:26 -0700
From:   Dongli Zhang <dongli.zhang@...cle.com>
To:     xen-devel@...ts.xenproject.org
Cc:     linux-kernel@...r.kernel.org, x86@...nel.org,
        boris.ostrovsky@...cle.com, jgross@...e.com,
        sstabellini@...nel.org, tglx@...utronix.de, mingo@...hat.com,
        bp@...en8.de, hpa@...or.com, andrew.cooper3@...rix.com,
        george.dunlap@...rix.com, iwj@...project.org, jbeulich@...e.com,
        julien@....org, wl@....org, joe.jin@...cle.com
Subject: [PATCH 0/2] Fix the Xen HVM kdump/kexec boot panic issue

When the kdump/kexec is enabled at HVM VM side, to panic kernel will trap
to xen side with reason=soft_reset. As a result, the xen will reboot the VM
with the kdump kernel.

Unfortunately, when the VM is panic with below command line ...

"taskset -c 33 echo c > /proc/sysrq-trigger"

... the kdump kernel is panic at early stage ...

PANIC: early exception 0x0e IP 10:ffffffffa8c66876 error 0 cr2 0x20
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0-rc5xen #1
[    0.000000] Hardware name: Xen HVM domU
[    0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0
... ...
[    0.000000] RSP: 0000:ffffffffaa203e20 EFLAGS: 00010082 ORIG_RAX: 0000000000000000
[    0.000000] RAX: 0000000000000003 RBX: 0000000000010000 RCX: 00000000ffffdfff
[    0.000000] RDX: 0000000000000003 RSI: 00000000ffffdfff RDI: 0000000000000020
[    0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09: 0000000000000001
[    0.000000] R10: ffffffffaa203e00 R11: ffffffffaa203c70 R12: 0000000040000004
[    0.000000] R13: ffffffffaa203e5c R14: ffffffffaa203e58 R15: 0000000000000000
[    0.000000] FS:  0000000000000000(0000) GS:ffffffffaa95e000(0000) knlGS:0000000000000000
[    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.000000] CR2: 0000000000000020 CR3: 00000000ec9e0000 CR4: 00000000000406a0
[    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    0.000000] Call Trace:
[    0.000000]  ? xen_init_time_common+0x11/0x55
[    0.000000]  ? xen_hvm_init_time_ops+0x23/0x45
[    0.000000]  ? xen_hvm_guest_init+0x214/0x251
[    0.000000]  ? 0xffffffffa8c00000
[    0.000000]  ? setup_arch+0x440/0xbd6
[    0.000000]  ? start_kernel+0x6a/0x689
[    0.000000]  ? secondary_startup_64_no_verify+0xc2/0xcb

This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info'
embedded inside 'shared_info' during early stage until xen_vcpu_setup() is
used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address.

The 1st patch is to fix the issue at VM kernel side. However, we may
observe clock drift at VM side due to the issue at xen hypervisor side.
This is because the pv vcpu_time_info is not updated when
VCPUOP_register_vcpu_info.

The 2nd patch is to force_update_vcpu_system_time() at xen side when
VCPUOP_register_vcpu_info, to avoid the VM clock drift during kdump kernel
boot.

I did test the fix by backporting the 2nd patch to a prior old xen version.
This is because I am not able to use soft_reset successfully with mainline
xen. I have encountered below error when testing soft_reset with mainline
xen. Please let me know if there is any know issue/solution.

# xl -v create -F vm.cfg
... ...
... ...
Domain 1 has shut down, reason code 5 0x5
Action for shutdown reason code 5 is soft-reset
Done. Rebooting now
xc: error: Failed to set d1's policy (err leaf 0xffffffff, subleaf 0xffffffff, msr 0xffffffff) (17 = File exists): Internal error
libxl: error: libxl_cpuid.c:488:libxl__cpuid_legacy: Domain 1:Failed to apply CPUID policy: File exists
libxl: error: libxl_create.c:1573:domcreate_rebuild_done: Domain 1:cannot (re-)build domain: -3
libxl: error: libxl_xshelp.c:201:libxl__xs_read_mandatory: xenstore read failed: `/libxl/1/type': No such file or directory
libxl: warning: libxl_dom.c:53:libxl__domain_type: unable to get domain type for domid=1, assuming HVM

Thank you very much!

Dongli Zhang