[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4cdf71ca-c4e4-49a5-a64d-d0549ad2cf7b@oracle.com>
Date: Tue, 16 Apr 2024 17:29:21 -0400
From: Alejandro Jimenez <alejandro.j.jimenez@...cle.com>
To: Paolo Bonzini <pbonzini@...hat.com>,
Sean Christopherson <seanjc@...gle.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
joao.m.martins@...cle.com, boris.ostrovsky@...cle.com,
mark.kanda@...cle.com, suravee.suthikulpanit@....com,
mlevitsk@...hat.com
Subject: Re: [RFC 0/3] Export APICv-related state via binary stats interface
On 4/16/24 15:51, Paolo Bonzini wrote:
> On Tue, Apr 16, 2024 at 8:08 PM Sean Christopherson <seanjc@...gle.com> wrote:
>>
>> On Thu, Feb 15, 2024, Alejandro Jimenez wrote:
>>> The goal of this RFC is to agree on a mechanism for querying the state (and
>>> related stats) of APICv/AVIC. I clearly have an AVIC bias when approaching this
>>> topic since that is the side that I have mostly looked at, and has the greater
>>> number of possible inhibits, but I believe the argument applies for both
>>> vendor's technologies.
>>>
>>> Currently, a user or monitoring app trying to determine if APICv is actually
>>> being used needs implementation-specific knowlegde in order to look for specific
>>> types of #VMEXIT (i.e. AVIC_INCOMPLETE_IPI/AVIC_NOACCEL), checking GALog events
>>> by watching /proc/interrupts for AMD-Vi*-GA, etc. There are existing tracepoints
>>> (e.g. kvm_apicv_accept_irq, kvm_avic_ga_log) that make this task easier, but
>>> tracefs is not viable in some scenarios. Adding kvm debugfs entries has similar
>>> downsides. Suravee has previously proposed a new IOCTL interface[0] to expose
>>> this information, but there has not been any development in that direction.
>>> Sean has mentioned a preference for using BPF to extract info from the current
>>> tracepoints, which would require reworking existing structs to access some
>>> desired data, but as far as I know there isn't any work done on that approach
>>> yet.
>>>
>>> Recently Joao mentioned another alternative: the binary stats framework that is
>>> already supported by kernel[1] and QEMU[2].
>>
>> The hiccup with stats are that they are ABI, e.g. we can't (easily) ditch stats
>> once they're added, and KVM needs to maintain the exact behavior.
>
> Stats are not ABI---why would they be? They have an established
> meaning and it's not a good idea to change it, but it's not an
> absolute no-no(*); and removing them is not a problem at all.
>
> For example, if stats were ABI, there would be no need for the
> introspection mechanism, you could just use a struct like ethtool
> stats (which *are* ABO).
>
> Not everything makes a good stat but, if in doubt and it's cheap
> enough to collect it, go ahead and add it. Cheap collection is the
> important point, because tracepoints in a hot path can be so expensive
> as to slow down the guest substantially, at least in microbenchmarks.
>
> In this case I'm not sure _all_ inhibits makes sense and I certainly
> wouldn't want a bitmask,
I wanted to be able to query enough info via stats (i.e. is APICv active, and if
not, why is it inhibited?) that is exposed via the other interfaces which are not
always available. That unfortunately requires a bit of "overloading" of
the stat as I mentioned earlier, but it remains cheap to collect and expose.
What would be your preferred interface to provide the (complete) APICv state?
but a generic APICv-enabled stat certainly
> makes sense, and perhaps another for a weirdly-configured local APIC.
Can you expand on what you mean by "weirdly-configured"? Do you mean something
like virtual wire mode?
Alejandro
>
> Paolo
>
> (*) you have to draw a line somewhere. New processor models may
> virtualize parts of the CPU in such a way that some stats become
> meaningless or just stay at zero. Should KVM not support those
> features because it is not possible anymore to introspect the guest
> through stat?
>
>> Tracepoints are explicitly not ABI, and so we can be much more permissive when it
>> comes to adding/expanding tracepoints, specifically because there are no guarantees
>> provided to userspace.
>>
>
>
Powered by blists - more mailing lists