lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 17 Mar 2016 11:22:40 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Radim Krcmar <rkrcmar@...hat.com>
Cc:	Paolo Bonzini <pbonzini@...hat.com>,
	Alexander Graf <agraf@...e.de>, kvm list <kvm@...r.kernel.org>,
	Marcelo Tosatti <mtosatti@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	X86 ML <x86@...nel.org>
Subject: Re: [PATCH 1/5] x86/kvm: On KVM re-enable (e.g. after suspend),
 update clocks

On Mar 17, 2016 8:10 AM, "Radim Krcmar" <rkrcmar@...hat.com> wrote:
>
> 2016-03-16 16:07-0700, Andy Lutomirski:
> > On Wed, Mar 16, 2016 at 3:59 PM, Radim Krcmar <rkrcmar@...hat.com> wrote:
> >> 2016-03-16 15:15-0700, Andy Lutomirski:
> >>> FWIW, if you ever intend to support ART ("always running timer")
> >>> passthrough, this is going to be a giant clusterfsck.  Good luck.  I
> >>> haven't gotten a straight answer as to what hardware actually supports
> >>> that thing, so even testing isn't no easy.
> >>
> >> Hm, AR TSC would be best handled by doing nothing ... dropping the
> >> faking logic just became tempting.
>
> ART is different from what I initially thought, it's the underlying
> mechanism for invariant TSC and nothing more ...  we already forbid
> migrations when the guest knows about invariant TSC, so we could do the
> same and let ART be virtualized.  (Suspend has to be forbidden too.)

It's more than that -- it's a TSC-like clock that can be read by PCIe devices.

>
> > As it stands, ART is screwed if you adjust the VMCS's tsc offset.  But
>
> Luckily, assigning real hardware can prevent migration or suspend, so we
> won't need to adjust the offset during runtime.  TSC is a generally
> unmigratable device that just happens to live on the CPU.
>
> (It would have been better to hide TSC capability from the guest and only
>  use rdtsc for kvmclock if the guest wanted fancy features.)
>

I think that, if KVM passes through an ART-supporting NIC, it might be
rather messy to try to avoid passing through TSC as well.  But maybe a
pvclock-like structure could expose the ART-kvmclock offset and scale.

> > I think it's also screwed if you migrate to a machine with a different
> > ratio of guest TSC ticks to host ART ticks or a different offset,
> > because the host isn't going to do the rdmsr every time it tries to
> > access the ART, so passing it through might require a paravirt
> > mechanism no matter what.
>
> It's almost certain that the other host will have a different offset,
> which makes TSC unmigratable in software without even considering ART
> or frequencies.  Well, KVM already emulates different TSC frequency, so
> we could emulate ART without sinking much lower. :)
>
> > ISTM that, if KVM tries to keep the guest TSC monotonic across
> > migration, it should probably also keep it monotonic across host
> > suspend/resume.
>
> Yes, "Pausing" TSC during suspend or migration is one way of improving
> the TSC estimate.  If we want to emulate ART, then the estimate is
> noticeably lacking, because TSC and ART are defined by a simple
> equation (SDM 2015-12, 17.14.4 Invariant Time-Keeping):
>  TSC_Value = (ART_Value * CPUID.15H:EBX[31:0] )/ CPUID.15H:EAX[31:0] + K
>
> where the guest thinks that CPUID and K are constant (between events
> that the guest knows of), so we should give the best estimate of how
> many TSC cycles have passed.  (The best estimate is still lacking.)
>
> >                  After all, host suspend/resume is kind of like
> > migrating from the pre-suspend host to the post-resume host.  Maybe it
> > could even share code.
>
> Hopefully ... host suspend/resume is driven by kernel and migration is
> driven by userspace, which might complicate sharing.

Good point.

--Andy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ