linux-kernel - Re: [RFC PATCH 2/2] KVM: arm64: export cntvoff in debugfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87czmw6zmo.wl-maz@kernel.org>
Date:   Fri, 19 Nov 2021 13:31:11 +0000
From:   Marc Zyngier <maz@...nel.org>
To:     Marcelo Tosatti <mtosatti@...hat.com>
Cc:     Nicolas Saenz Julienne <nsaenzju@...hat.com>,
        linux-arm-kernel@...ts.infradead.org, rostedt@...dmis.org,
        james.morse@....com, alexandru.elisei@....com,
        suzuki.poulose@....com, catalin.marinas@....com, will@...nel.org,
        linux-kernel@...r.kernel.org, kvmarm@...ts.cs.columbia.edu,
        mingo@...hat.com, nilal@...hat.com
Subject: Re: [RFC PATCH 2/2] KVM: arm64: export cntvoff in debugfs

On Fri, 19 Nov 2021 12:59:46 +0000,
Marcelo Tosatti <mtosatti@...hat.com> wrote:
> 
> On Fri, Nov 19, 2021 at 12:17:00PM +0000, Marc Zyngier wrote:
> > On Fri, 19 Nov 2021 10:21:18 +0000,
> > Nicolas Saenz Julienne <nsaenzju@...hat.com> wrote:
> > > 
> > > While using cntvct as the raw clock for tracing, it's possible to
> > > synchronize host/guest traces just by knowing the virtual offset applied
> > > to the guest's virtual counter.
> > > 
> > > This is also the case on x86 when TSC is available. The offset is
> > > exposed in debugfs as 'tsc-offset' on a per vcpu basis. So let's
> > > implement the same for arm64.
> > 
> > How does this work with NV, where the guest hypervisor is in control
> > of the virtual offset? How does userspace knows which vcpu to pick so
> > that it gets the right offset?
> 
> On x86, the offsets for different vcpus are the same due to the logic at
> kvm_synchronize_tsc function:
> 
> During guest vcpu creation, when the TSC-clock values are written
> in a short window of time (or the clock value is zero), the code uses
> the same TSC.
> 
> This logic is problematic (since "short window of time" is a heuristic 
> which can fail), and is being replaced by writing the same offset
> for each vCPU:
> 
> commit 828ca89628bfcb1b8f27535025f69dd00eb55207
> Author: Oliver Upton <oupton@...gle.com>
> Date:   Thu Sep 16 18:15:38 2021 +0000
> 
>     KVM: x86: Expose TSC offset controls to userspace
>     
>     To date, VMM-directed TSC synchronization and migration has been a bit
>     messy. KVM has some baked-in heuristics around TSC writes to infer if
>     the VMM is attempting to synchronize. This is problematic, as it depends
>     on host userspace writing to the guest's TSC within 1 second of the last
>     write.
>     
>     A much cleaner approach to configuring the guest's views of the TSC is to
>     simply migrate the TSC offset for every vCPU. Offsets are idempotent,
>     and thus not subject to change depending on when the VMM actually
>     reads/writes values from/to KVM. The VMM can then read the TSC once with
>     KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
>     the guest is paused.
> 
> So with that in place, the answer to 
> 
> How does userspace knows which vcpu to pick so
> that it gets the right offset?
> 
> is any vcpu, since the offsets are the same.

As I just said above, this assertion doesn't hold true once you have
nested virt, because the offset is per-cpu, and is adjusted to mean
different things on different hypervisors (some hypervisors expose
stolen time through it, for example).

What this patch is doing is to expose a Linux-specific behaviour, and
try to derive properties from it. It really doesn't work in general.

> 
> > I also wonder why we need this when userspace already has direct
> > access to that information without any extra kernel support (read the
> > CNTVCT view of the vcpu using the ONEREG API, subtract it from the
> > host view of the counter, job done).
> 
> If guest has access to the clock offset (between guest and host), then
> in the guest:
> 
> 	clockval = hostclockval - clockoffset
> 
> Adding "clockoffset" to that will retrieve the host clock.
> 
> Is that what you mean?

No. The *VMM* (qemu, kvmtool, crosvm, insertyourfavouriteonehere) has
already access to it. Why do we need an extra interface?

	M.

-- 
Without deviation from the norm, progress is not possible.