linux-kernel - Re: [PATCH v2] KVM: x86: do not go through vcpu in __get_kvmclock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 16 Nov 2016 12:27:48 -0500 (EST)
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Radim Krčmář <rkrcmar@...hat.com>
Cc:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        mtosatti@...hat.com
Subject: Re: [PATCH v2] KVM: x86: do not go through vcpu in
 __get_kvmclock_ns


> > -	if (vcpu->arch.hv_clock.flags & PVCLOCK_TSC_STABLE_BIT) {
> > -		u64 tsc = kvm_read_l1_tsc(vcpu, rdtsc());
> > -		ns = __pvclock_read_cycles(&vcpu->arch.hv_clock, tsc);
> > -	} else {
> > -		ns = ktime_get_boot_ns() + ka->kvmclock_offset;
> > -	}
> 
> If we access the "global" master clock, it would be better to prevent it
> from changing under our hands with
>   	spin_lock(&ka->pvclock_gtod_sync_lock).

Yes, good idea.

> > +	if (!ka->use_master_clock)
> > +		return ktime_get_boot_ns() + ka->kvmclock_offset;
> >  
> > -	return ns;
> > +	hv_clock.tsc_timestamp = ka->master_cycle_now;
> > +	hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset;
> > +	kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL,
> > +			   &hv_clock.tsc_shift,
> > +			   &hv_clock.tsc_to_system_mul);
> 
> Doesn't this result in a minor drift with scaled clock, because the
> guest can be combining two systems that approximate frequency?

You mean instead of doing read_l1_tsc?

>   1) tsc_shift and tsc_to_system_mul for kvmclock scaling
>   2) hardware TSC scaling ratio
> 
> If we are on a 7654321 kHz TSC and TSC-ratio scale to 1234567 kHz and
> then tsc_shift+tsc_to_system_mul kvmclock-scale to 1000000 kHz, we
> should be using multipliers of
>   0.161290204578564186163606151349022336533834941074459772460...  and
>   0.810000591300431649315104000025920018921613812778083328000...,
> to achieve that.  Those multipliers cannot be precisely expressed in
> what we have (shifts and 64/32 bit multipliers with intermediate values
> only up to 128 bits), so performing the scaling will result in slightly
> incorrect frequency.
> 
> The result of combining two operations that alter the freqency is quite
> unlikely to cancel out and produce the same result as an operation that
> uses a different shift+multiplier to scale in one step, so I think that
> we aren't getting the same time as the guest with TSC-scaling is seeing.

I think you get pretty good precision, since 30 fractional bits are more
or less equivalent to nanosecond precision.  For example, cutting the two
ratios above to 30 fractional bits I get respectively 173184038/2^30
and 869731512/2^30.  Multiplying them gives 140279173/2^30 which matches
exactly the fixed point representation of 1000000/7654321.

Since the TSC scaling frequency has a larger precision (32 or 48 bits),
you should get at most 1 ulp error, which is not bad.

Paolo

> (I'd be happier if we didn't ignore this drift when the whole endeavor
>  started just to get rid of a drift, but introducing a minor bug is still
>  improving the situation -- I'm ok with first two changes only.)
> 
> > +	return __pvclock_read_cycles(&hv_clock, rdtsc());
> >  }
> >  
> >  u64 get_kvmclock_ns(struct kvm *kvm)
> > --
> > 1.8.3.1
> > 
>