[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALMp9eSQWTTGQoJQ+f=ondF2wiiCaMiO-PMV0eaYJNXXrt4gQA@mail.gmail.com>
Date:   Tue, 30 May 2023 13:00:57 -0700
From:   Jim Mattson <jmattson@...gle.com>
To:     Like Xu <like.xu.linux@...il.com>
Cc:     Sean Christopherson <seanjc@...gle.com>,
        Sandipan Das <sandipan.das@....com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Ravi Bangoria <ravi.bangoria@....com>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Santosh Shukla <santosh.shukla@....com>,
        "Tom Lendacky (AMD)" <thomas.lendacky@....com>,
        Ananth Narayan <ananth.narayan@....com>
Subject: Re: [PATCH 5/5] KVM: x86/pmu: Hide guest counter updates from the
 VMRUN instruction
On Mon, May 29, 2023 at 7:51 AM Like Xu <like.xu.linux@...il.com> wrote:
>
> On 25/5/2023 5:32 am, Jim Mattson wrote:
> > On Wed, May 24, 2023 at 2:29 PM Sean Christopherson <seanjc@...gle.com> wrote:
> >>
> >> On Wed, May 24, 2023, Jim Mattson wrote:
> >>> On Wed, May 24, 2023 at 1:41 PM Sean Christopherson <seanjc@...gle.com> wrote:
> >>>>
> >>>> On Wed, Apr 26, 2023, Sandipan Das wrote:
> >>>>> Hi Sean, Like,
> >>>>>
> >>>>> On 4/19/2023 7:11 PM, Like Xu wrote:
> >>>>>>> Heh, it's very much explicable, it's just not desirable, and you and I would argue
> >>>>>>> that it's also incorrect.
> >>>>>>
> >>>>>> This is completely inaccurate from the end guest pmu user's perspective.
> >>>>>>
> >>>>>> I have a toy that looks like virtio-pmu, through which guest users can get hypervisor performance data.
> >>>>>> But the side effect of letting the guest see the VMRUN instruction by default is unacceptable, isn't it ?
> >>>>>>
> >>>>>>>
> >>>>>>> AMD folks, are there plans to document this as an erratum?� I agree with Like that
> >>>>>>> counting VMRUN as a taken branch in guest context is a CPU bug, even if the behavior
> >>>>>>> is known/expected.
> >>>>>>
> >>>>>
> >>>>> This behaviour is architectural and an erratum will not be issued. However, for clarity, a future
> >>>>> release of the APM will include additional details like the following:
> >>>>>
> >>>>>    1) From the perspective of performance monitoring counters, VMRUNs are considered as far control
> >>>>>       transfers and VMEXITs as exceptions.
> >>>>>
> >>>>>    2) When the performance monitoring counters are set up to count events only in certain modes
> >>>>>       through the "OsUserMode" and "HostGuestOnly" bits, instructions and events that change the
> >>>>>       mode are counted in the target mode. For example, a SYSCALL from CPL 3 to CPL 0 with a
> >>>>>       counter set to count retired instructions with USR=1 and OS=0 will not cause an increment of
> >>>>>       the counter. However, the SYSRET back from CPL 0 to CPL 3 will cause an increment of the
> >>>>>       counter and the total count will end up correct. Similarly, when counting PMCx0C6 (retired
> >>>>>       far control transfers, including exceptions and interrupts) with Guest=1 and Host=0, a VMRUN
> >>>>>       instruction will cause an increment of the counter. However, the subsequent VMEXIT that occurs,
> >>>>>       since the target is in the host, will not cause an increment of the counter and so the total
> >>>>>       count will end up correct.
> >>>>
> >>>> The count from the guest's perspective does not "end up correct".  Unlike SYSCALL,
> >>>> where _userspace_ deliberately and synchronously executes a branch instruction,
> >>>> VMEXIT and VMRUN are supposed to be transparent to the guest and can be completely
> >>>> asynchronous with respect to guest code execution, e.g. if the host is spamming
> >>>> IRQs, the guest will see a potentially large number of bogus (from it's perspective)
> >>>> branches retired.
> >>>
> >>> The reverse problem occurs when a PMC is configured to count "CPUID
> >>> instructions retired." Since KVM intercepts CPUID and emulates it, the
> >>> PMC will always read 0, even if the guest executes a tight loop of
> >>> CPUID instructions.
>
> Unlikely. KVM will count any emulated instructions based on kvm_pmu_incr_counter().
> Did I miss some conditions ?
That code only increments PMCs configured to count "instructions
retired" and "branch instructions retired." It does not increment PMCs
configured to count "CPUID instructions retired."
> >>>
> >>> The PMU is not virtualizable on AMD CPUs without significant
> >>> hypervisor corrections. I have to wonder if it's really worth the
> >>> effort.
>
> I used to think so, until I saw the AMD64_EVENTSEL_GUESTONLY bit.
> Hardware architects are expected to put more effort into this area.
>
> >>
> >> Per our offlist chat, my understanding is that there are caveats with vPMUs that
> >> it's simply not feasible for a hypervisor to handle.  I.e. virtualizing any x86
> >> PMU with 100% accuracy isn't happening anytime soon.
>
> Indeed, and any more detailed complaints ?
Reference cycles unhalted fails to increment outside of guest mode.
SMIs received counts *physical* rather than virtual SMIs
Interrupts taken counts *physical* rather than virtual interrupts taken.
> >>
> >> The way forward is likely to evaluate each caveat on a case-by-case basis to
> >> determine whether or not the cost of the fixup in KVM is worth the benefit to
> >> the guest.  E.g. emulating "CPUID instructions retired" seems like it would be
> >> fairly straightforward.  AFAICT, fixing up the VMRUN stuff is quite difficult though.
> >
> > Yeah. The problem with fixing up "CPUID instructions retired" is
> > tracking what the event encoding is for every F/M/S out there. It's
> > not worth it.
>
> I don't think it's feasible to emulate 100% accuracy on Intel. For guest pmu
> users, it is motivated by wanting to know how effective they are running on
> the current pCPU, and any vPMU eimulation behavior that helps this
> understanding would be valuable.
But at least Intel has a list of architected events, which are mostly
amenable to virtualization.
Powered by blists - more mailing lists
 
