lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABgObfZugCVm7xysXBB=OZhQumga1bvit8oXbfhyjk7ncCa4NA@mail.gmail.com>
Date: Tue, 16 Apr 2024 23:36:36 +0200
From: Paolo Bonzini <pbonzini@...hat.com>
To: Alejandro Jimenez <alejandro.j.jimenez@...cle.com>
Cc: Sean Christopherson <seanjc@...gle.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	joao.m.martins@...cle.com, boris.ostrovsky@...cle.com, mark.kanda@...cle.com, 
	suravee.suthikulpanit@....com, mlevitsk@...hat.com
Subject: Re: [RFC 0/3] Export APICv-related state via binary stats interface

On Tue, Apr 16, 2024 at 11:29 PM Alejandro Jimenez
<alejandro.j.jimenez@...cle.com> wrote:
>
>
>
> On 4/16/24 15:51, Paolo Bonzini wrote:
> > On Tue, Apr 16, 2024 at 8:08 PM Sean Christopherson <seanjc@...gle.com> wrote:
> >>
> >> On Thu, Feb 15, 2024, Alejandro Jimenez wrote:
> >>> The goal of this RFC is to agree on a mechanism for querying the state (and
> >>> related stats) of APICv/AVIC. I clearly have an AVIC bias when approaching this
> >>> topic since that is the side that I have mostly looked at, and has the greater
> >>> number of possible inhibits, but I believe the argument applies for both
> >>> vendor's technologies.
> >>>
> >>> Currently, a user or monitoring app trying to determine if APICv is actually
> >>> being used needs implementation-specific knowlegde in order to look for specific
> >>> types of #VMEXIT (i.e. AVIC_INCOMPLETE_IPI/AVIC_NOACCEL), checking GALog events
> >>> by watching /proc/interrupts for AMD-Vi*-GA, etc. There are existing tracepoints
> >>> (e.g. kvm_apicv_accept_irq, kvm_avic_ga_log) that make this task easier, but
> >>> tracefs is not viable in some scenarios. Adding kvm debugfs entries has similar
> >>> downsides. Suravee has previously proposed a new IOCTL interface[0] to expose
> >>> this information, but there has not been any development in that direction.
> >>> Sean has mentioned a preference for using BPF to extract info from the current
> >>> tracepoints, which would require reworking existing structs to access some
> >>> desired data, but as far as I know there isn't any work done on that approach
> >>> yet.
> >>>
> >>> Recently Joao mentioned another alternative: the binary stats framework that is
> >>> already supported by kernel[1] and QEMU[2].
> >>
> >> The hiccup with stats are that they are ABI, e.g. we can't (easily) ditch stats
> >> once they're added, and KVM needs to maintain the exact behavior.
> >
> > Stats are not ABI---why would they be? They have an established
> > meaning and it's not a good idea to change it, but it's not an
> > absolute no-no(*); and removing them is not a problem at all.
> >
> > For example, if stats were ABI, there would be no need for the
> > introspection mechanism, you could just use a struct like ethtool
> > stats (which *are* ABO).
> >
> > Not everything makes a good stat but, if in doubt and it's cheap
> > enough to collect it, go ahead and add it. Cheap collection is the
> > important point, because tracepoints in a hot path can be so expensive
> > as to slow down the guest substantially, at least in microbenchmarks.
> >
> > In this case I'm not sure _all_ inhibits makes sense and I certainly
> > wouldn't want a bitmask,
>
> I wanted to be able to query enough info via stats (i.e. is APICv active, and if
> not, why is it inhibited?) that is exposed via the other interfaces which are not
> always available. That unfortunately requires a bit of "overloading" of
> the stat as I mentioned earlier, but it remains cheap to collect and expose.
>
> What would be your preferred interface to provide the (complete) APICv state?
>
>   but a generic APICv-enabled stat certainly
> > makes sense, and perhaps another for a weirdly-configured local APIC.
>
> Can you expand on what you mean by "weirdly-configured"? Do you mean something
> like virtual wire mode?

I mean any of

    APICV_INHIBIT_REASON_HYPERV,
    APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED,
    APICV_INHIBIT_REASON_APIC_ID_MODIFIED,
    APICV_INHIBIT_REASON_APIC_BASE_MODIFIED,
    APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED,

which in practice are always going to be APICV_INHIBIT_REASON_HYPERV
on 99.99% of the gues

ExtINT is visible via irq_window_exits; if you are not running nested,
do not have the problematic configurations above. don't have debugging
(BLOCKIRQ) or in-kernel PIT with reinjection, that's basically the
only one that's left.

Paolo

> Alejandro
>
> >
> > Paolo
> >
> > (*) you have to draw a line somewhere. New processor models may
> > virtualize parts of the CPU in such a way that some stats become
> > meaningless or just stay at zero. Should KVM not support those
> > features because it is not possible anymore to introspect the guest
> > through stat?
> >
> >> Tracepoints are explicitly not ABI, and so we can be much more permissive when it
> >> comes to adding/expanding tracepoints, specifically because there are no guarantees
> >> provided to userspace.
> >>
> >
> >
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ