lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fe6d7aaf-97b5-dfa2-75e5-1a072f4aac92@oracle.com>
Date:   Thu, 4 Nov 2021 10:59:55 -0700
From:   Dongli Zhang <dongli.zhang@...cle.com>
To:     Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        xen-devel@...ts.xenproject.org, x86@...nel.org
Cc:     jgross@...e.com, sstabellini@...nel.org, tglx@...utronix.de,
        mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
        hpa@...or.com, joe.jin@...cle.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/1] xen: delay xen_hvm_init_time_ops() if kdump is
 boot on vcpu>=32

Hi Boris,

On 11/1/21 10:34 AM, Boris Ostrovsky wrote:
> 
> On 10/27/21 9:25 PM, Dongli Zhang wrote:
>> The sched_clock() can be used very early since
>> commit 857baa87b642 ("sched/clock: Enable sched clock early"). In addition,
>> with commit 38669ba205d1 ("x86/xen/time: Output xen sched_clock time from
>> 0"), kdump kernel in Xen HVM guest may panic at very early stage when
>> accessing &__this_cpu_read(xen_vcpu)->time as in below:
>>
>> setup_arch()
>>   -> init_hypervisor_platform()
>>       -> x86_init.hyper.init_platform = xen_hvm_guest_init()
>>           -> xen_hvm_init_time_ops()
>>               -> xen_clocksource_read()
>>                   -> src = &__this_cpu_read(xen_vcpu)->time;
>>
>> This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info'
>> embedded inside 'shared_info' during early stage until xen_vcpu_setup() is
>> used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address.
>>
>> However, when Xen HVM guest panic on vcpu >= 32, since
>> xen_vcpu_info_reset(0) would set per_cpu(xen_vcpu, cpu) = NULL when
>> vcpu >= 32, xen_clocksource_read() on vcpu >= 32 would panic.
>>
>> This patch delays xen_hvm_init_time_ops() to later in
>> xen_hvm_smp_prepare_boot_cpu() after the 'vcpu_info' for boot vcpu is
>> registered when the boot vcpu is >= 32.
>>
>> Another option is to always delay xen_hvm_init_time_ops() for any vcpus
>> (including vcpu=0). Since to delay xen_hvm_init_time_ops() may lead to
>> clock backward issue,
> 
> 
> This is referring to
> https://lists.xenproject.org/archives/html/xen-devel/2021-10/msg01516.html I
> assume?

No.

So far there are clock forward (e.g., from 0 to 65345) issue and clock backward
issue (e.g., from 2.432 to 0).

The clock forward issue can be resolved by above link to enforce clock update
during vcpu info registration. However, so far I am only able to reproduce clock
forward when "taskset -c 33 echo c > /proc/sysrq-trigger".

That is, I am not able to see any clock forward drift during regular boot (on
CPU=0), without the fix at hypervisor side.

The clock backward issue is because the native clock source is used if we delay
the initialization of xen clock source. We will see a backward when the source
is switched from native to xen.

> 
> 
>>   it is preferred to avoid that for regular boot (The
>> pv_sched_clock=native_sched_clock() is used at the very beginning until
>> xen_sched_clock() is registered). That requires to adjust
>> xen_sched_clock_offset. That's why we only delay xen_hvm_init_time_ops()
>> for vcpu>=32.
> 
> 
> We delay only on VCPU>=32 because we want to avoid the clock going backwards due
> to hypervisor problem pointed to be the link above, not because we need to
> adjust xen_sched_clock_offset (which we could if we wanted).

I will add that.

Just to clarify that so far I am not able to reproduce the clock backward issue
during regular boot (on CPU=0), when the fix is not available at hypervisor side.

> 
> 
>>
>> This issue can be reproduced on purpose via below command at the guest
>> side when kdump/kexec is enabled:
>>
>> "taskset -c 33 echo c > /proc/sysrq-trigger"
>>
>> Reference:
>> https://lists.xenproject.org/archives/html/xen-devel/2021-10/msg00571.html
>> Cc: Joe Jin <joe.jin@...cle.com>
>> Signed-off-by: Dongli Zhang <dongli.zhang@...cle.com>
>> ---
>> Changed since v1:
>>    - Add commit message to explain why xen_hvm_init_time_ops() is delayed
>>      for any vcpus. (Suggested by Boris Ostrovsky)
>>    - Add a comment in xen_hvm_smp_prepare_boot_cpu() referencing the related
>>      code in xen_hvm_guest_init(). (suggested by Juergen Gross)
>>
>>   arch/x86/xen/enlighten_hvm.c | 20 +++++++++++++++++++-
>>   arch/x86/xen/smp_hvm.c       |  8 ++++++++
>>   2 files changed, 27 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
>> index e68ea5f4ad1c..7734dec52794 100644
>> --- a/arch/x86/xen/enlighten_hvm.c
>> +++ b/arch/x86/xen/enlighten_hvm.c
>> @@ -216,7 +216,25 @@ static void __init xen_hvm_guest_init(void)
>>       WARN_ON(xen_cpuhp_setup(xen_cpu_up_prepare_hvm, xen_cpu_dead_hvm));
>>       xen_unplug_emulated_devices();
>>       x86_init.irqs.intr_init = xen_init_IRQ;
>> -    xen_hvm_init_time_ops();
>> +
>> +    /*
>> +     * Only MAX_VIRT_CPUS 'vcpu_info' are embedded inside 'shared_info'
>> +     * and the VM would use them until xen_vcpu_setup() is used to
>> +     * allocate/relocate them at arbitrary address.
>> +     *
>> +     * However, when Xen HVM guest panic on vcpu >= MAX_VIRT_CPUS,
>> +     * per_cpu(xen_vcpu, cpu) is still NULL at this stage. To access
>> +     * per_cpu(xen_vcpu, cpu) via xen_clocksource_read() would panic.
>> +     *
>> +     * Therefore we delay xen_hvm_init_time_ops() to
>> +     * xen_hvm_smp_prepare_boot_cpu() when boot vcpu is >= MAX_VIRT_CPUS.
>> +     */
>> +    if (xen_vcpu_nr(0) >= MAX_VIRT_CPUS)
>> +        pr_info("Delay xen_hvm_init_time_ops() as kernel is running on
>> vcpu=%d\n",
>> +            xen_vcpu_nr(0));
>> +    else
>> +        xen_hvm_init_time_ops();
>> +
>>       xen_hvm_init_mmu_ops();
>>     #ifdef CONFIG_KEXEC_CORE
>> diff --git a/arch/x86/xen/smp_hvm.c b/arch/x86/xen/smp_hvm.c
>> index 6ff3c887e0b9..f99043df8bb5 100644
>> --- a/arch/x86/xen/smp_hvm.c
>> +++ b/arch/x86/xen/smp_hvm.c
>> @@ -19,6 +19,14 @@ static void __init xen_hvm_smp_prepare_boot_cpu(void)
>>        */
>>       xen_vcpu_setup(0);
>>   +    /*
>> +     * The xen_hvm_init_time_ops() is delayed from
>> +     * xen_hvm_guest_init() to here to avoid panic when the kernel
>> +     * boots from vcpu>=MAX_VIRT_CPUS (32).
>> +     */
> 
> 
> How about
> 
>   /* Deferred call to xen_hvm_init_time_ops(). See comment in
> xen_hvm_guest_init() */
> 

I will do that.

Thank you very much!

Dongli Zhang

> 
> -boris
> 
> 
> 
>> +    if (xen_vcpu_nr(0) >= MAX_VIRT_CPUS)
>> +        xen_hvm_init_time_ops();
>> +
>>       /*
>>        * The alternative logic (which patches the unlock/lock) runs before
>>        * the smp bootup up code is activated. Hence we need to set this up

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ