linux-kernel - Re: [RFC PATCH 2/2] KVM: arm64: export cntvoff in debugfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0e948a211bd8d63ba05594fb8c03bf3a77a227a0.camel@redhat.com>
Date:   Mon, 22 Nov 2021 21:40:52 +0100
From:   Nicolas Saenz Julienne <nsaenzju@...hat.com>
To:     Marc Zyngier <maz@...nel.org>
Cc:     linux-arm-kernel@...ts.infradead.org, rostedt@...dmis.org,
        james.morse@....com, alexandru.elisei@....com,
        suzuki.poulose@....com, catalin.marinas@....com, will@...nel.org,
        linux-kernel@...r.kernel.org, kvmarm@...ts.cs.columbia.edu,
        mingo@...hat.com, mtosatti@...hat.com, nilal@...hat.com
Subject: Re: [RFC PATCH 2/2] KVM: arm64: export cntvoff in debugfs

Hi Marc, thanks for the review.

On Fri, 2021-11-19 at 12:17 +0000, Marc Zyngier wrote:
> On Fri, 19 Nov 2021 10:21:18 +0000,
> Nicolas Saenz Julienne <nsaenzju@...hat.com> wrote:
> > 
> > While using cntvct as the raw clock for tracing, it's possible to
> > synchronize host/guest traces just by knowing the virtual offset applied
> > to the guest's virtual counter.
> > 
> > This is also the case on x86 when TSC is available. The offset is
> > exposed in debugfs as 'tsc-offset' on a per vcpu basis. So let's
> > implement the same for arm64.
> 
> How does this work with NV, where the guest hypervisor is in control
> of the virtual offset? 

TBH I handn't thought about NV. Looking at it from that angle, I now see my
approach doesn't work on hosts that use CNTVCT (regardless of NV). Upon
entering into a guest, we change CNTVOFF before the host is done with tracing,
so traces like 'kvm_entry' will have weird timestamps. I was just lucky that
the hosts I was testing with use CNTPCT.

I believe the solution would be to be able to force a 0 offset between
guest/host. With that in mind, is there a reason why kvm_timer_vcpu_init()
imposes a non-zero one by default? I checked out the commits that introduced
that code, but couldn't find a compelling reason. VMMs can always change it
through KVM_REG_ARM_TIMER_CNT afterwards.

> I also wonder why we need this when userspace already has direct access to
> that information without any extra kernel support (read the CNTVCT view of
> the vcpu using the ONEREG API, subtract it from the host view of the counter,
> job done).

Well IIUC, you're at the mercy of how long it takes to return from the ONEREG
ioctl. The results will be skewed. For some workloads, where low latency is
key, we really need high precision traces in the order of single digit us or
even 100s of ns. I'm not sure you'll be able to get there with that approach.

[...]

> > diff --git a/arch/arm64/kvm/debugfs.c b/arch/arm64/kvm/debugfs.c
> > new file mode 100644
> > index 000000000000..f0f5083ea8d4
> > --- /dev/null
> > +++ b/arch/arm64/kvm/debugfs.c
> > @@ -0,0 +1,25 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (C) 2021 Red Hat Inc.
> > + */
> > +
> > +#include <linux/kvm_host.h>
> > +#include <linux/debugfs.h>
> > +
> > +#include <kvm/arm_arch_timer.h>
> > +
> > +static int vcpu_get_cntv_offset(void *data, u64 *val)
> > +{
> > +	struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
> > +
> > +	*val = timer_get_offset(vcpu_vtimer(vcpu));
> > +
> > +	return 0;
> > +}
> > +
> > +DEFINE_SIMPLE_ATTRIBUTE(vcpu_cntvoff_fops, vcpu_get_cntv_offset, NULL, "%lld\n");
> > +
> > +void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, struct dentry *debugfs_dentry)
> > +{
> > +	debugfs_create_file("cntvoff", 0444, debugfs_dentry, vcpu, &vcpu_cntvoff_fops);
> > +}
> 
> This should be left in arch_timer.c until we actually need it for
> multiple subsystems. When (and if) that happens, we will expose
> per-subsystem debugfs initialisers instead of exposing the guts of the
> timer code.

Noted.

-- 
Nicolás Sáenz