[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1f19674f-e608-1faa-5656-fec853297198@oracle.com>
Date: Mon, 1 Nov 2021 13:34:03 -0400
From: Boris Ostrovsky <boris.ostrovsky@...cle.com>
To: Dongli Zhang <dongli.zhang@...cle.com>,
xen-devel@...ts.xenproject.org, x86@...nel.org
Cc: jgross@...e.com, sstabellini@...nel.org, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
hpa@...or.com, joe.jin@...cle.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/1] xen: delay xen_hvm_init_time_ops() if kdump is
boot on vcpu>=32
On 10/27/21 9:25 PM, Dongli Zhang wrote:
> The sched_clock() can be used very early since
> commit 857baa87b642 ("sched/clock: Enable sched clock early"). In addition,
> with commit 38669ba205d1 ("x86/xen/time: Output xen sched_clock time from
> 0"), kdump kernel in Xen HVM guest may panic at very early stage when
> accessing &__this_cpu_read(xen_vcpu)->time as in below:
>
> setup_arch()
> -> init_hypervisor_platform()
> -> x86_init.hyper.init_platform = xen_hvm_guest_init()
> -> xen_hvm_init_time_ops()
> -> xen_clocksource_read()
> -> src = &__this_cpu_read(xen_vcpu)->time;
>
> This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info'
> embedded inside 'shared_info' during early stage until xen_vcpu_setup() is
> used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address.
>
> However, when Xen HVM guest panic on vcpu >= 32, since
> xen_vcpu_info_reset(0) would set per_cpu(xen_vcpu, cpu) = NULL when
> vcpu >= 32, xen_clocksource_read() on vcpu >= 32 would panic.
>
> This patch delays xen_hvm_init_time_ops() to later in
> xen_hvm_smp_prepare_boot_cpu() after the 'vcpu_info' for boot vcpu is
> registered when the boot vcpu is >= 32.
>
> Another option is to always delay xen_hvm_init_time_ops() for any vcpus
> (including vcpu=0). Since to delay xen_hvm_init_time_ops() may lead to
> clock backward issue,
This is referring to https://lists.xenproject.org/archives/html/xen-devel/2021-10/msg01516.html I assume?
> it is preferred to avoid that for regular boot (The
> pv_sched_clock=native_sched_clock() is used at the very beginning until
> xen_sched_clock() is registered). That requires to adjust
> xen_sched_clock_offset. That's why we only delay xen_hvm_init_time_ops()
> for vcpu>=32.
We delay only on VCPU>=32 because we want to avoid the clock going backwards due to hypervisor problem pointed to be the link above, not because we need to adjust xen_sched_clock_offset (which we could if we wanted).
>
> This issue can be reproduced on purpose via below command at the guest
> side when kdump/kexec is enabled:
>
> "taskset -c 33 echo c > /proc/sysrq-trigger"
>
> Reference:
> https://lists.xenproject.org/archives/html/xen-devel/2021-10/msg00571.html
> Cc: Joe Jin <joe.jin@...cle.com>
> Signed-off-by: Dongli Zhang <dongli.zhang@...cle.com>
> ---
> Changed since v1:
> - Add commit message to explain why xen_hvm_init_time_ops() is delayed
> for any vcpus. (Suggested by Boris Ostrovsky)
> - Add a comment in xen_hvm_smp_prepare_boot_cpu() referencing the related
> code in xen_hvm_guest_init(). (suggested by Juergen Gross)
>
> arch/x86/xen/enlighten_hvm.c | 20 +++++++++++++++++++-
> arch/x86/xen/smp_hvm.c | 8 ++++++++
> 2 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
> index e68ea5f4ad1c..7734dec52794 100644
> --- a/arch/x86/xen/enlighten_hvm.c
> +++ b/arch/x86/xen/enlighten_hvm.c
> @@ -216,7 +216,25 @@ static void __init xen_hvm_guest_init(void)
> WARN_ON(xen_cpuhp_setup(xen_cpu_up_prepare_hvm, xen_cpu_dead_hvm));
> xen_unplug_emulated_devices();
> x86_init.irqs.intr_init = xen_init_IRQ;
> - xen_hvm_init_time_ops();
> +
> + /*
> + * Only MAX_VIRT_CPUS 'vcpu_info' are embedded inside 'shared_info'
> + * and the VM would use them until xen_vcpu_setup() is used to
> + * allocate/relocate them at arbitrary address.
> + *
> + * However, when Xen HVM guest panic on vcpu >= MAX_VIRT_CPUS,
> + * per_cpu(xen_vcpu, cpu) is still NULL at this stage. To access
> + * per_cpu(xen_vcpu, cpu) via xen_clocksource_read() would panic.
> + *
> + * Therefore we delay xen_hvm_init_time_ops() to
> + * xen_hvm_smp_prepare_boot_cpu() when boot vcpu is >= MAX_VIRT_CPUS.
> + */
> + if (xen_vcpu_nr(0) >= MAX_VIRT_CPUS)
> + pr_info("Delay xen_hvm_init_time_ops() as kernel is running on vcpu=%d\n",
> + xen_vcpu_nr(0));
> + else
> + xen_hvm_init_time_ops();
> +
> xen_hvm_init_mmu_ops();
>
> #ifdef CONFIG_KEXEC_CORE
> diff --git a/arch/x86/xen/smp_hvm.c b/arch/x86/xen/smp_hvm.c
> index 6ff3c887e0b9..f99043df8bb5 100644
> --- a/arch/x86/xen/smp_hvm.c
> +++ b/arch/x86/xen/smp_hvm.c
> @@ -19,6 +19,14 @@ static void __init xen_hvm_smp_prepare_boot_cpu(void)
> */
> xen_vcpu_setup(0);
>
> + /*
> + * The xen_hvm_init_time_ops() is delayed from
> + * xen_hvm_guest_init() to here to avoid panic when the kernel
> + * boots from vcpu>=MAX_VIRT_CPUS (32).
> + */
How about
/* Deferred call to xen_hvm_init_time_ops(). See comment in xen_hvm_guest_init() */
-boris
> + if (xen_vcpu_nr(0) >= MAX_VIRT_CPUS)
> + xen_hvm_init_time_ops();
> +
> /*
> * The alternative logic (which patches the unlock/lock) runs before
> * the smp bootup up code is activated. Hence we need to set this up
Powered by blists - more mailing lists