lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aIln5KlHYlIg3Ui-@google.com>
Date: Tue, 29 Jul 2025 17:31:32 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Kan Liang <kan.liang@...ux.intel.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Mingwei Zhang <mizhang@...gle.com>, 
	Ingo Molnar <mingo@...hat.com>, Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>, 
	Paolo Bonzini <pbonzini@...hat.com>, Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Ian Rogers <irogers@...gle.com>, Adrian Hunter <adrian.hunter@...el.com>, Liang@...gle.com, 
	"H. Peter Anvin" <hpa@...or.com>, linux-perf-users@...r.kernel.org, 
	linux-kernel@...r.kernel.org, kvm@...r.kernel.org, 
	linux-kselftest@...r.kernel.org, Yongwei Ma <yongwei.ma@...el.com>, 
	Xiong Zhang <xiong.y.zhang@...ux.intel.com>, Dapeng Mi <dapeng1.mi@...ux.intel.com>, 
	Jim Mattson <jmattson@...gle.com>, Sandipan Das <sandipan.das@....com>, 
	Zide Chen <zide.chen@...el.com>, Eranian Stephane <eranian@...gle.com>, 
	Shukla Manali <Manali.Shukla@....com>, Nikunj Dadhania <nikunj.dadhania@....com>
Subject: Re: [PATCH v4 10/38] perf/x86: Support switch_guest_ctx interface

On Fri, Apr 25, 2025, Kan Liang wrote:
> On 2025-04-25 9:43 a.m., Peter Zijlstra wrote:
> > On Fri, Apr 25, 2025 at 09:06:26AM -0400, Liang, Kan wrote:
> >>
> >>
> >> On 2025-04-25 7:15 a.m., Peter Zijlstra wrote:
> >>> On Mon, Mar 24, 2025 at 05:30:50PM +0000, Mingwei Zhang wrote:
> >>>> From: Kan Liang <kan.liang@...ux.intel.com>
> >>>>
> >>>> Implement switch_guest_ctx interface for x86 PMU, switch PMI to dedicated
> >>>> KVM_GUEST_PMI_VECTOR at perf guest enter, and switch PMI back to
> >>>> NMI at perf guest exit.
> >>>>
> >>>> Signed-off-by: Xiong Zhang <xiong.y.zhang@...ux.intel.com>
> >>>> Signed-off-by: Kan Liang <kan.liang@...ux.intel.com>
> >>>> Tested-by: Yongwei Ma <yongwei.ma@...el.com>
> >>>> Signed-off-by: Mingwei Zhang <mizhang@...gle.com>
> >>>> ---
> >>>>  arch/x86/events/core.c | 12 ++++++++++++
> >>>>  1 file changed, 12 insertions(+)
> >>>>
> >>>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> >>>> index 8f218ac0d445..28161d6ff26d 100644
> >>>> --- a/arch/x86/events/core.c
> >>>> +++ b/arch/x86/events/core.c
> >>>> @@ -2677,6 +2677,16 @@ static bool x86_pmu_filter(struct pmu *pmu, int cpu)
> >>>>  	return ret;
> >>>>  }
> >>>>  
> >>>> +static void x86_pmu_switch_guest_ctx(bool enter, void *data)
> >>>> +{
> >>>> +	u32 guest_lvtpc = *(u32 *)data;
> >>>> +
> >>>> +	if (enter)
> >>>> +		apic_write(APIC_LVTPC, guest_lvtpc);
> >>>> +	else
> >>>> +		apic_write(APIC_LVTPC, APIC_DM_NMI);
> >>>> +}
> >>>
> >>> This, why can't it use x86_pmu.guest_lvtpc here and call it a day? Why
> >>> is that argument passed around through the generic code only to get back
> >>> here?
> >>
> >> The vector has to be from the KVM. However, the current interfaces only
> >> support KVM read perf variables, e.g., perf_get_x86_pmu_capability and
> >> perf_get_hw_event_config.
> >> We need to add an new interface to allow the KVM write a perf variable,
> >> e.g., perf_set_guest_lvtpc.
> > 
> > But all that should remain in x86, there is no reason what so ever to
> > leak this into generic code.

Finally prepping v5, and this is one of two <knock wood> comments that isn't fully
addressed.

The vector isn't a problem; that's *always* PERF_GUEST_MEDIATED_PMI_VECTOR and
so doesn't even require anything in x86_pmu.

But whether or not the entry should be masked comes from the guest's LVTPC entry,
and I don't see a cleaner way to get that information into x86, especially since
the switch between host and guest PMI needs to happen in the "perf context disabled"
section.

I think/hope I dressed up the code so that it's not _so_ ugly, and so that it's
fully extensible in the unlikely event a non-x86 arch were to ever support a
mediated vPMU, e.g. @data could be used to pass a pointer to a struct.

  void perf_load_guest_context(unsigned long data)
  {
	struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);

	lockdep_assert_irqs_disabled();

	guard(perf_ctx_lock)(cpuctx, cpuctx->task_ctx);

	if (WARN_ON_ONCE(__this_cpu_read(guest_ctx_loaded)))
		return;

	perf_ctx_disable(&cpuctx->ctx, EVENT_GUEST);
	ctx_sched_out(&cpuctx->ctx, NULL, EVENT_GUEST);
	if (cpuctx->task_ctx) {
		perf_ctx_disable(cpuctx->task_ctx, EVENT_GUEST);
		task_ctx_sched_out(cpuctx->task_ctx, NULL, EVENT_GUEST);
	}

	arch_perf_load_guest_context(data);

	...
  }

  void arch_perf_load_guest_context(unsigned long data)
  {
	u32 masked = data & APIC_LVT_MASKED;

	apic_write(APIC_LVTPC,
		   APIC_DM_FIXED | PERF_GUEST_MEDIATED_PMI_VECTOR | masked);
	this_cpu_write(x86_guest_ctx_loaded, true);
  }

Holler if you have a better idea.  I'll plan on posting v5 in the next day or so
no matter what, so that it's not delayed for this one thing (it's already been
delayed more than I was hoping, and there are a lot of changes relative to v4).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ