linux-kernel - Re: [PATCH v5 09/44] perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put context

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250815113951.GC4067720@noisy.programming.kicks-ass.net>
Date: Fri, 15 Aug 2025 13:39:51 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Marc Zyngier <maz@...nel.org>, Oliver Upton <oliver.upton@...ux.dev>,
	Tianrui Zhao <zhaotianrui@...ngson.cn>,
	Bibo Mao <maobibo@...ngson.cn>, Huacai Chen <chenhuacai@...nel.org>,
	Anup Patel <anup@...infault.org>,
	Paul Walmsley <paul.walmsley@...ive.com>,
	Palmer Dabbelt <palmer@...belt.com>,
	Albert Ou <aou@...s.berkeley.edu>, Xin Li <xin@...or.com>,
	"H. Peter Anvin" <hpa@...or.com>, Andy Lutomirski <luto@...nel.org>,
	Ingo Molnar <mingo@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Namhyung Kim <namhyung@...nel.org>,
	Paolo Bonzini <pbonzini@...hat.com>,
	linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.linux.dev,
	kvm@...r.kernel.org, loongarch@...ts.linux.dev,
	kvm-riscv@...ts.infradead.org, linux-riscv@...ts.infradead.org,
	linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
	Kan Liang <kan.liang@...ux.intel.com>,
	Yongwei Ma <yongwei.ma@...el.com>,
	Mingwei Zhang <mizhang@...gle.com>,
	Xiong Zhang <xiong.y.zhang@...ux.intel.com>,
	Sandipan Das <sandipan.das@....com>,
	Dapeng Mi <dapeng1.mi@...ux.intel.com>
Subject: Re: [PATCH v5 09/44] perf/x86: Switch LVTPC to/from mediated PMI
 vector on guest load/put context

On Wed, Aug 06, 2025 at 12:56:31PM -0700, Sean Christopherson wrote:
> Add arch hooks to the mediated vPMU load/put APIs, and use the hooks to
> switch PMIs to the dedicated mediated PMU IRQ vector on load, and back to
> perf's standard NMI when the guest context is put.  I.e. route PMIs to
> PERF_GUEST_MEDIATED_PMI_VECTOR when the guest context is active, and to
> NMIs while the host context is active.
> 
> While running with guest context loaded, ignore all NMIs (in perf).  Any
> NMI that arrives while the LVTPC points at the mediated PMU IRQ vector
> can't possibly be due to a host perf event.
> 
> Signed-off-by: Xiong Zhang <xiong.y.zhang@...ux.intel.com>
> Signed-off-by: Kan Liang <kan.liang@...ux.intel.com>
> Signed-off-by: Mingwei Zhang <mizhang@...gle.com>
> [sean: use arch hook instead of per-PMU callback]
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
>  arch/x86/events/core.c     | 27 +++++++++++++++++++++++++++
>  include/linux/perf_event.h |  3 +++
>  kernel/events/core.c       |  4 ++++
>  3 files changed, 34 insertions(+)
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 7610f26dfbd9..9b0525b252f1 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -55,6 +55,8 @@ DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
>  	.pmu = &pmu,
>  };
>  
> +static DEFINE_PER_CPU(bool, x86_guest_ctx_loaded);
> +
>  DEFINE_STATIC_KEY_FALSE(rdpmc_never_available_key);
>  DEFINE_STATIC_KEY_FALSE(rdpmc_always_available_key);
>  DEFINE_STATIC_KEY_FALSE(perf_is_hybrid);
> @@ -1756,6 +1758,16 @@ perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
>  	u64 finish_clock;
>  	int ret;
>  
> +	/*
> +	 * Ignore all NMIs when a guest's mediated PMU context is loaded.  Any
> +	 * such NMI can't be due to a PMI as the CPU's LVTPC is switched to/from
> +	 * the dedicated mediated PMI IRQ vector while host events are quiesced.
> +	 * Attempting to handle a PMI while the guest's context is loaded will
> +	 * generate false positives and clobber guest state.
> +	 */
> +	if (this_cpu_read(x86_guest_ctx_loaded))
> +		return NMI_DONE;
> +
>  	/*
>  	 * All PMUs/events that share this PMI handler should make sure to
>  	 * increment active_events for their events.
> @@ -2727,6 +2739,21 @@ static struct pmu pmu = {
>  	.filter			= x86_pmu_filter,
>  };
>  
> +void arch_perf_load_guest_context(unsigned long data)
> +{
> +	u32 masked = data & APIC_LVT_MASKED;
> +
> +	apic_write(APIC_LVTPC,
> +		   APIC_DM_FIXED | PERF_GUEST_MEDIATED_PMI_VECTOR | masked);
> +	this_cpu_write(x86_guest_ctx_loaded, true);
> +}
> +
> +void arch_perf_put_guest_context(void)
> +{
> +	this_cpu_write(x86_guest_ctx_loaded, false);
> +	apic_write(APIC_LVTPC, APIC_DM_NMI);
> +}
> +
>  void arch_perf_update_userpage(struct perf_event *event,
>  			       struct perf_event_mmap_page *userpg, u64 now)
>  {
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 0c529fbd97e6..3a9bd9c4c90e 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1846,6 +1846,9 @@ static inline unsigned long perf_arch_guest_misc_flags(struct pt_regs *regs)
>  # define perf_arch_guest_misc_flags(regs)	perf_arch_guest_misc_flags(regs)
>  #endif
>  
> +extern void arch_perf_load_guest_context(unsigned long data);
> +extern void arch_perf_put_guest_context(void);
> +
>  static inline bool needs_branch_stack(struct perf_event *event)
>  {
>  	return event->attr.branch_sample_type != 0;
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index e1df3c3bfc0d..ad22b182762e 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6408,6 +6408,8 @@ void perf_load_guest_context(unsigned long data)
>  		task_ctx_sched_out(cpuctx->task_ctx, NULL, EVENT_GUEST);
>  	}
>  
> +	arch_perf_load_guest_context(data);

So I still don't understand why this ever needs to reach the generic
code. x86 pmu driver and x86 kvm can surely sort this out inside of x86,
no?