[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ee9b0e64-9659-40bc-938d-f02fb411b6a4@default>
Date: Tue, 17 Apr 2012 08:36:11 -0700 (PDT)
From: Dan Magenheimer <dan.magenheimer@...cle.com>
To: Jan Beulich <JBeulich@...e.com>
Cc: David Vrabel <david.vrabel@...rix.com>,
Thomas Gleixner <tglx@...utronix.de>,
xen-devel <xen-devel@...ts.xen.org>,
Konrad Wilk <konrad.wilk@...cle.com>,
linux-kernel@...r.kernel.org, "Tim(Xen.org)" <tim@....org>,
Sheng Yang <sheng@...ker.org>
Subject: RE: [Xen-devel] [PATCH] xen: always set the sched clock as unstable
> From: Jan Beulich [mailto:JBeulich@...e.com]
> Subject: RE: [Xen-devel] [PATCH] xen: always set the sched clock as unstable
>
> >>> On 16.04.12 at 19:22, Dan Magenheimer <dan.magenheimer@...cle.com> wrote:
> > In upstream (and recent pv-ops) kernels, is there any need for there
> > to be a difference between HVM and PV in the clocksource chosen? The
>
> Yes, because RDTSC interception for PV guests is slow (using #GP
> and requiring instruction decode).
"Slow" is relative. I showed (somewhere on xen-devel years ago) that
the emulation performance hit is much smaller than the original developers
expected and is detectable only with certain applications that
execute rdtsc ~100K/second. Furthermore, the cycle count of an rdtsc
has gone up on modern systems, so the cost ratio of emulating
rdtsc vs executing the raw instruction is going down.
> > pvclock algorithm was necessary for PV when non-TSC hardware clocks
> > were privileged and the only non-privileged hardware clock (TSC)
> > was badly broken in hardware and for migration/save/restore.
> > With TSC now working and stable, and now that we are making changes
> > in the upstream kernel that work for both PV and HVM, is it
> > time to drop pvclock (at least as the default for PV)?
> >
> > Certainly if an old (non-pv-ops) kernel is broken, something like
> > David's patch might be an acceptable workaround. I'm just arguing
> > against perpetuating pvclock-as-the-only-xen-clock upstream.
>
> Afaict, the only uniformly reliable clocksource for PV guests is the
> virtual one which pvclock builds upon. Raw TSC is definitely not an
> option on NUMA systems (and PV guests aren't aware of the
> NUMAness of the underlying system).
You'll have to define NUMA. On "old" NUMA systems, where there are
multiple motherboards, your statement is true. On newer systems
where NUMA simply means there are multiple memory controllers and
all of them are cache-coherent, even when there are multiple
"motherboards" joined by HT or QPI, processor and system vendors
take great pains to ensure that the clock signal (and thus TSC) is
synchronized and "stable" across all cpus.
But I agree there ARE exceptions... for those, I proposed a Xen boot
option that said "don't trust TSC even if all the evidence implies
that you can", but Keir shot it down (also years ago).
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists