linux-kernel - Re: [RFC 0/3] Export APICv-related state via binary stats interface

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABgObfZ4kqaXLaOAOj4aGB5GAe9GxOmJmOP+7kdke6OqA35HzA@mail.gmail.com>
Date: Tue, 16 Apr 2024 21:51:10 +0200
From: Paolo Bonzini <pbonzini@...hat.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Alejandro Jimenez <alejandro.j.jimenez@...cle.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, joao.m.martins@...cle.com, 
	boris.ostrovsky@...cle.com, mark.kanda@...cle.com, 
	suravee.suthikulpanit@....com, mlevitsk@...hat.com
Subject: Re: [RFC 0/3] Export APICv-related state via binary stats interface

On Tue, Apr 16, 2024 at 8:08 PM Sean Christopherson <seanjc@...gle.com> wrote:
>
> On Thu, Feb 15, 2024, Alejandro Jimenez wrote:
> > The goal of this RFC is to agree on a mechanism for querying the state (and
> > related stats) of APICv/AVIC. I clearly have an AVIC bias when approaching this
> > topic since that is the side that I have mostly looked at, and has the greater
> > number of possible inhibits, but I believe the argument applies for both
> > vendor's technologies.
> >
> > Currently, a user or monitoring app trying to determine if APICv is actually
> > being used needs implementation-specific knowlegde in order to look for specific
> > types of #VMEXIT (i.e. AVIC_INCOMPLETE_IPI/AVIC_NOACCEL), checking GALog events
> > by watching /proc/interrupts for AMD-Vi*-GA, etc. There are existing tracepoints
> > (e.g. kvm_apicv_accept_irq, kvm_avic_ga_log) that make this task easier, but
> > tracefs is not viable in some scenarios. Adding kvm debugfs entries has similar
> > downsides. Suravee has previously proposed a new IOCTL interface[0] to expose
> > this information, but there has not been any development in that direction.
> > Sean has mentioned a preference for using BPF to extract info from the current
> > tracepoints, which would require reworking existing structs to access some
> > desired data, but as far as I know there isn't any work done on that approach
> > yet.
> >
> > Recently Joao mentioned another alternative: the binary stats framework that is
> > already supported by kernel[1] and QEMU[2].
>
> The hiccup with stats are that they are ABI, e.g. we can't (easily) ditch stats
> once they're added, and KVM needs to maintain the exact behavior.

Stats are not ABI---why would they be? They have an established
meaning and it's not a good idea to change it, but it's not an
absolute no-no(*); and removing them is not a problem at all.

For example, if stats were ABI, there would be no need for the
introspection mechanism, you could just use a struct like ethtool
stats (which *are* ABO).

Not everything makes a good stat but, if in doubt and it's cheap
enough to collect it, go ahead and add it. Cheap collection is the
important point, because tracepoints in a hot path can be so expensive
as to slow down the guest substantially, at least in microbenchmarks.

In this case I'm not sure _all_ inhibits makes sense and I certainly
wouldn't want a bitmask, but a generic APICv-enabled stat certainly
makes sense, and perhaps another for a weirdly-configured local APIC.

Paolo

(*) you have to draw a line somewhere. New processor models may
virtualize parts of the CPU in such a way that some stats become
meaningless or just stay at zero. Should KVM not support those
features because it is not possible anymore to introspect the guest
through stat?

> Tracepoints are explicitly not ABI, and so we can be much more permissive when it
> comes to adding/expanding tracepoints, specifically because there are no guarantees
> provided to userspace.
>