[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170116193343.GB4639@amt.cnet>
Date: Mon, 16 Jan 2017 17:33:46 -0200
From: Marcelo Tosatti <mtosatti@...hat.com>
To: Radim Krcmar <rkrcmar@...hat.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Paolo Bonzini <pbonzini@...hat.com>,
Richard Cochran <richardcochran@...il.com>,
Miroslav Lichvar <mlichvar@...hat.com>
Subject: Re: [patch 3/3] PTP: add kvm PTP driver
On Mon, Jan 16, 2017 at 06:46:49PM +0100, Radim Krcmar wrote:
> 2017-01-16 15:04-0200, Marcelo Tosatti:
> > On Mon, Jan 16, 2017 at 05:26:53PM +0100, Radim Krcmar wrote:
> >> 2017-01-13 15:40-0200, Marcelo Tosatti:
> >> > On Fri, Jan 13, 2017 at 04:56:58PM +0100, Radim Krcmar wrote:
> >> > > 2017-01-13 10:01-0200, Marcelo Tosatti:
> >> >> > + version = pvclock_read_begin(src);
> >> >> > +
> >> >> > + ret = kvm_hypercall2(KVM_HC_CLOCK_OFFSET,
> >> >> > + clock_off_gpa,
> >> >> > + KVM_CLOCK_OFFSET_WALLCLOCK);
> >> >> > + if (ret != 0) {
> >> >> > + pr_err("clock offset hypercall ret %lu\n", ret);
> >> >> > + spin_unlock(&kvm_ptp_lock);
> >> >> > + preempt_enable_notrace();
> >> >> > + return -EOPNOTSUPP;
> >> >> > + }
> >> >> > +
> >> >> > + tspec.tv_sec = clock_off.sec;
> >> >> > + tspec.tv_nsec = clock_off.nsec;
> >> >> > +
> >> >> > + delta = rdtsc_ordered() - clock_off.tsc;
> >> >> > +
> >> >> > + offset = pvclock_scale_delta(delta, src->tsc_to_system_mul,
> >> >> > + src->tsc_shift);
> >> >> > +
> >> >> > + } while (pvclock_read_retry(src, version));
> >> >> > +
> >> >> > + preempt_enable_notrace();
> >> >> > +
> >> >> > + tspec.tv_nsec = tspec.tv_nsec + offset;
> >> >> > +
> >> >> > + spin_unlock(&kvm_ptp_lock);
> >> >> > +
> >> >> > + if (tspec.tv_nsec >= NSEC_PER_SEC) {
> >> >> > + u64 secs = tspec.tv_nsec;
> >> >> > +
> >> >> > + tspec.tv_nsec = do_div(secs, NSEC_PER_SEC);
> >> >> > + tspec.tv_sec += secs;
> >> >> > + }
> >> >> > +
> >> >> > + memcpy(ts, &tspec, sizeof(struct timespec64));
> >> >>
> >> >> But the whole idea is of improving the time by reading tsc a bit later
> >> >> is just weird ... why is it better to provide
> >> >>
> >> >> tsc + x, time + tsc_delta_to_time(x)
> >> >>
> >> >> than just
> >> >>
> >> >> tsc, time
> >> >>
> >> >> ?
> >> >
> >> > Because you want to calculate the value of the host realtime clock
> >> > at the moment of ptp_kvm_gettime.
> >> >
> >> > We do:
> >> >
> >> > 1. kvm_hypercall.
> >> > 2. get {sec, nsec, guest_tsc}.
> >> > 3. kvm_hypercall returns.
> >> > 4. delay = rdtsc() - guest_tsc.
> >> >
> >> > Where delay is the delta (measured with the TSC) between points 2 and 4.
> >>
> >> I see now ... the PTP interface is just not good for our purposes.
> >> We don't return {sec, nsec, guest_tsc}, we just return {sec, nsec} at
> >> some random time in the past. And to make it a bit more accurate, you
> >> add a best-effort delta before returning, which makes sense.
> >
> > Not random time in the past. We return {sec, nsec} from the host
> > realtime at the moment the user ran the hypercall.
>
> That is what we want, but {sec, nsec} is not anchored to any running
> time, so it is unavoidably late when we return it.
>
> > Since PTP is very accurate, that "a bit more" counts, yes.
> >
> >> When we have to depend on pvclock, what are the advantages of not using
> >> the existing pvclock API for wall clock?
> >> (You mentioned some extensions.)
> >>
> >> struct pvclock_wall_clock {
> >> u32 version;
> >> u32 sec;
> >> u32 nsec;
> >> } __attribute__((__packed__));
> >
> >> It gives the wall clock when pvclock was 0, so you just add current
> >> kvmclock and get the host wall clock.
> >
> > Well, no. For one, the TSC part of kvmclock:
> >
> > kvmclock-read = system_timestamp + convert-to-1GHz(rdtsc() - tsc_timestamp)
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > Drifts relative to UTC. This part can be large.
> > The guests NTP is responsible for fixing
> > that drift of the guests realtime clock (talking about current setup,
> > without KVM PTP driver).
> >
> > Now, we want very high precision (less than 1us) for this
> > driver. Very small TSC drifts on a large delta defeat the purpose.
>
> True.
>
> >> Without a VM exit.
> >
> > Huge performance is not an issue. Accuracy (how different from the host
> > realtime clock our "approximation" of the host realtime clock) is.
>
> Ah, ok.
>
> >> And how often is ptp_kvm_gettime() usually called?
> >
> > The PTP_SYS_OFFSET ioctl calls the following code in a loop:
> >
> > struct ptp_sys_offset {
> > unsigned int n_samples; /* Desired number of measurements. */
> > unsigned int rsv[3]; /* Reserved for future use. */
> > /*
> > * Array of interleaved system/phc time stamps. The kernel
> > * will provide 2*n_samples + 1 time stamps, with the last
> > * one as a system time stamp.
> > */
> > struct ptp_clock_time ts[2 * PTP_MAX_SAMPLES + 1];
> > };
> >
> > #define PTP_MAX_SAMPLES 25 /* Maximum allowed offset measurement
> > samples. */
> >
> > case PTP_SYS_OFFSET:
> > sysoff = memdup_user((void __user *)arg,
> > sizeof(*sysoff));
> > if (IS_ERR(sysoff)) {
> > err = PTR_ERR(sysoff);
> > sysoff = NULL;
> > break;
> > }
> > if (sysoff->n_samples > PTP_MAX_SAMPLES) {
> > err = -EINVAL;
> > break;
> > }
> > pct = &sysoff->ts[0];
> > for (i = 0; i < sysoff->n_samples; i++) {
> > getnstimeofday64(&ts);
> > pct->sec = ts.tv_sec;
> > pct->nsec = ts.tv_nsec;
> > pct++;
> > ptp->info->gettime64(ptp->info, &ts);
> > pct->sec = ts.tv_sec;
> > pct->nsec = ts.tv_nsec;
> > pct++;
>
> Hm, this loop alternates between system time and ptp time and I'd guess
> that userspace then manipulates deltas of these two readings and applies
> the result to system time.
Yes.
> I'm not even sure that pvclock delta improves the situation anymore --
> it adds an offset to all gettime64() reads, but that can be computed
> from multiple datapoints. (And the delta also adds uncertainty to
> results.)
Sorry: the value returned from ->gettime64 should be the value of the
clock at that instant.
If you remove the pvclock delta (lets assume its 2000ns), then you'll
sync getnstimeofday64 to a clock thats 2000ns behind the host clock.
Does that make sense?
> > }
> > getnstimeofday64(&ts);
> > pct->sec = ts.tv_sec;
> > pct->nsec = ts.tv_nsec;
> >
> > How often that ioctl is called depends on the parameters of the Chrony
> > PHC code. Initially (to determine the clock difference Chrony should call it
> > more frequently, later on it should call it less frequency).
> >
> > Perhaps once every second initially (the ioctl). I'll confirm with the
> > exact value for my setup and reply to this email.
>
> Thanks.
Attached. 1551 hypercalls in 38 secs is about 40 times per second.
The ioctl is calling ->gettime64 10 per ioctl, so thats about 4
ioctls per sec.
View attachment "trace.txt" of type "text/plain" (185239 bytes)
Powered by blists - more mailing lists