lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4C44B819.2030203@redhat.com>
Date:	Mon, 19 Jul 2010 10:39:53 -1000
From:	Zachary Amsden <zamsden@...hat.com>
To:	Avi Kivity <avi@...hat.com>
CC:	KVM <kvm@...r.kernel.org>, Marcelo Tosatti <mtosatti@...hat.com>,
	Glauber Costa <glommer@...hat.com>,
	Linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 09/18] Robust TSC compensation

On 07/18/2010 04:52 AM, Avi Kivity wrote:
> On 07/13/2010 05:25 AM, Zachary Amsden wrote:
>> Make the match of TSC find TSC writes that are close to each other
>> instead of perfectly identical; this allows the compensator to also
>> work in migration / suspend scenarios.
>>
>
> What scenario exactly?

After migration, qemu will write back MSRs, including TSC to the VCPUs.  
They won't have exactly matching values, because they get read out at 
different times (actually, because the TSC for the VCPUs never stops, 
they can have wildly different times if there was some host overload / 
swap / suspend event).

When restarting the CPUs, qemu will try to write back the TSC and then 
we end up desynchronizing the system.

It's an ugly problem, and this is an ugly solution.

Better would be to "stop" the VCPUs (requires some kernel 
synchronization to determine TSC stop point), or to simply take the 
maximum TSC in qemu and write that to all of the CPUs (this assumes the 
guest wants to have TSCs in sync at all).

Both methods have to assume small deltas in TSC are unintentional 
effects in order to correctly resynchronize.

>
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -926,21 +926,27 @@ void guest_write_tsc(struct kvm_vcpu *vcpu, u64 
>> data)
>>       struct kvm *kvm = vcpu->kvm;
>>       u64 offset, ns, elapsed;
>>       struct timespec ts;
>> +    s64 sdiff;
>>
>>       spin_lock(&kvm->arch.tsc_write_lock);
>>       offset = data - native_read_tsc();
>>       ns = get_kernel_ns();
>>       elapsed = ns - kvm->arch.last_tsc_nsec;
>> +    sdiff = data - kvm->arch.last_tsc_write;
>> +    if (sdiff<  0)
>> +        sdiff = -sdiff;
>>
>>       /*
>> -     * Special case: identical write to TSC within 5 seconds of
>> +     * Special case: close write to TSC within 5 seconds of
>>        * another CPU is interpreted as an attempt to synchronize
>> -     * (the 5 seconds is to accomodate host load / swapping).
>> +     * The 5 seconds is to accomodate host load / swapping as
>> +     * well as any reset of TSC during the boot process.
>>        *
>>        * In that case, for a reliable TSC, we can match TSC offsets,
>> -     * or make a best guest using kernel_ns value.
>> +     * or make a best guest using elapsed value.
>>        */
>> -    if (data == kvm->arch.last_tsc_write&&  elapsed<  5ULL * 
>> NSEC_PER_SEC) {
>> +    if (sdiff<  nsec_to_cycles(5ULL * NSEC_PER_SEC)&&
>> +        elapsed<  5ULL * NSEC_PER_SEC) {
>>           if (!check_tsc_unstable()) {
>>               offset = kvm->arch.last_tsc_offset;
>>               pr_debug("kvm: matched tsc offset for %llu\n", data);
>
> Don't we have to adjust offset to the required different between tsc?  
> Or do we assume, that if the guest wrote close enough values, it is 
> trying to cleverly compensate for IPI latency?
>

No, we have to assume that any small (small being defined as < 5 second) 
difference is unintentional.  It's not perfect and is certainly error 
prone (without one of the two assists from qemu that I mention above).

I think qemu should probably take the maximum TSC and apply it to all VCPUs.

Zach
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ