[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aBy2AGIFi34q031x@google.com>
Date: Thu, 8 May 2025 06:47:44 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Dapeng Mi <dapeng1.mi@...ux.intel.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
Seth Forshee <sforshee@...nel.org>
Subject: Re: [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest
with vCPU's value.
On Tue, May 06, 2025, Dapeng Mi wrote:
>
> On 5/2/2025 11:04 PM, Sean Christopherson wrote:
> > On Sun, Apr 27, 2025, Dapeng Mi wrote:
> >> On 4/26/2025 8:13 AM, Sean Christopherson wrote:
> >> Currently we have this Sean's fix, only the guest PEBS event bits of
> >> IA32_PEBS_ENABLE MSR are enabled in non-root mode, suppose we can simply
> >> change global_ctrl guest value calculation to this.
> >>
> >> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> >> index 09d2d66c9f21..5bc56bb616ec 100644
> >> --- a/arch/x86/events/intel/core.c
> >> +++ b/arch/x86/events/intel/core.c
> >> @@ -4342,9 +4342,12 @@ static struct perf_guest_switch_msr
> >> *intel_guest_get_msrs(int *nr, void *data)
> >> arr[global_ctrl] = (struct perf_guest_switch_msr){
> >> .msr = MSR_CORE_PERF_GLOBAL_CTRL,
> >> .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
> >> - .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask,
> >> + .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask,
> >> };
> > Hmm, that's not as clear cut. PEBS needs to be disabled because leaving it enabled
> > will crash the guest. For the counter itself, unless leaving it enabled breaks
> > perf and/or degrades the sampling, I don't think there's an obvious right/wrong
> > approach.
> >
> > E.g. if the host wants to profile host and guest, then keeping the count running
> > while the guest is active might be a good thing. It's still far, far from
> > perfect, as a counter that overflows when the guest is active won't generate a
> > PEBS record, but without digging further, it's not obvious that even that flaw
> > is overall worse than always disabling the counter.
>
> Hmm, if the host PEBS event only samples host side, whether the HW counter
> or the PEBS engine would be disabled once VM enters non-root mode, the KVM
> PEBS implementation is correct. But for the host PEBS events which sampling
> both host and guest, the implementation seems incorrect.
Well, yeah, because there is no correct implementation.
> As the below code shows, as long as there are host PEBS events regardless
> of the host PEBS events only sample guest or both host and guest, the host
> PEBS events would be disabled on both HW counters and PEBS engine once VM
> enters non-root mode.
>
> arr[global_ctrl] = (struct perf_guest_switch_msr){
> .msr = MSR_CORE_PERF_GLOBAL_CTRL,
> .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
> .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask,
> };
>
> if (arr[pebs_enable].host) {
> /* Disable guest PEBS if host PEBS is enabled. */
> arr[pebs_enable].guest = 0;
>
> }
>
> So the host PEBS events which hopes to sample both host and guest only
> samples host side in fact. This is unexpected.
It's only unexpected in the sense that it's probably not well documented. Because
the DS buffer is virtually address, there simply isn't a sane way to enable PEBS
(or any feature that utizies the DS buffer) while running a KVM guest that isn't
enlightened to explicitly allow profiling via host PEBS (and AFAIK, no such guest
exists).
Even when KVM is using shadowing paging, i.e. fully controls the page tables used
while the guest is running, enabling PEBS isn't feasible as KVM has no way to
prevent the guest from using the virtual address. E.g. KVM could shove in mappings
for the DS buffer, but that DoS the guest if the guest wants to use the same
virtual address range for its own purposes, and would be a massive data leak to
the guest since the guest could read host data from the buffer.
Powered by blists - more mailing lists