linux-kernel - Re: [Patch v8 02/12] perf/x86/intel: Fix NULL event access and potential PEBS record loss

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <aa0667d2-c0ad-4a40-898b-cf1363a0941f@linux.intel.com>
Date: Thu, 23 Oct 2025 10:29:27 +0800
From: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>, Arnaldo Carvalho de Melo
 <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
 Ian Rogers <irogers@...gle.com>, Adrian Hunter <adrian.hunter@...el.com>,
 Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
 Andi Kleen <ak@...ux.intel.com>, Eranian Stephane <eranian@...gle.com>,
 linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
 Dapeng Mi <dapeng1.mi@...el.com>, kernel test robot <oliver.sang@...el.com>,
 Kan Liang <kan.liang@...ux.intel.com>
Subject: Re: [Patch v8 02/12] perf/x86/intel: Fix NULL event access and
 potential PEBS record loss


On 10/22/2025 7:24 PM, Peter Zijlstra wrote:
> On Wed, Oct 22, 2025 at 04:12:14PM +0800, Mi, Dapeng wrote:
>
>> Just think twice about this fix, it seems current fix is incomplete.
>> Besides the PEBS handler, the basic PMI handler could encounter same issue,
>> like the below code in handle_pmi_common(),
>>
>>     for_each_set_bit(bit, (unsigned long *)&status, X86_PMC_IDX_MAX) {
>>         struct perf_event *event = cpuc->events[bit];
>>         u64 last_period;
>>
>>         handled++;
>>
>>         if (!test_bit(bit, cpuc->active_mask))
>>             continue;
>>
>> Although the NULL event would not be accessed by checking
>> the cpuc->active_mask, the potential overflow process of these NULL events
>> is skipped as well, it may cause data loss.
>>
>> Moreover, current approach defines temporary variables to snapshot the
>> active events, the temporary variables may consume too much stack memory
>> (384 bytes).
>>
>> So I enhance the fix as below. Do you have any comment on this? Thanks.
> So I didn't like the previous and I like this even less. What about
> something like this instead?
>
> I quickly went through the cpuc->event[ users and they all either check
> active_mask or, in case of the PEBS stuff, check pebs_enabled mask
> (which should be a subset of active_mask).
>
> (the PEBS last case depends on count being 0 for all counters that are
> not set in pebs_enabled)

Yes.


>
> WDYT?
>
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 745caa6c15a3..74479f9d6eed 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1344,6 +1344,7 @@ static void x86_pmu_enable(struct pmu *pmu)
>  				hwc->state |= PERF_HES_ARCH;
>  
>  			x86_pmu_stop(event, PERF_EF_UPDATE);
> +			cpuc->events[hwc->idx] = NULL;
>  		}
>  
>  		/*
> @@ -1365,6 +1366,7 @@ static void x86_pmu_enable(struct pmu *pmu)
>  			 * if cpuc->enabled = 0, then no wrmsr as
>  			 * per x86_pmu_enable_event()
>  			 */
> +			cpuc->events[hwc->idx] = event;
>  			x86_pmu_start(event, PERF_EF_RELOAD);
>  		}
>  		cpuc->n_added = 0;
> @@ -1531,7 +1533,6 @@ static void x86_pmu_start(struct perf_event *event, int flags)
>  
>  	event->hw.state = 0;
>  
> -	cpuc->events[idx] = event;
>  	__set_bit(idx, cpuc->active_mask);
>  	static_call(x86_pmu_enable)(event);
>  	perf_event_update_userpage(event);
> @@ -1610,7 +1611,6 @@ void x86_pmu_stop(struct perf_event *event, int flags)
>  	if (test_bit(hwc->idx, cpuc->active_mask)) {
>  		static_call(x86_pmu_disable)(event);
>  		__clear_bit(hwc->idx, cpuc->active_mask);
> -		cpuc->events[hwc->idx] = NULL;
>  		WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
>  		hwc->state |= PERF_HES_STOPPED;
>  	}
> @@ -1648,6 +1648,7 @@ static void x86_pmu_del(struct perf_event *event, int flags)
>  	 * Not a TXN, therefore cleanup properly.
>  	 */
>  	x86_pmu_stop(event, PERF_EF_UPDATE);
> +	cpuc->events[event->hw.idx] = NULL;
>  
>  	for (i = 0; i < cpuc->n_events; i++) {
>  		if (event == cpuc->event_list[i])

This is a much prettier fix. Thanks.👍 

It looks good to me. I did basic tests with this fix, and didn't find any
issues. But considering this is a such fundamental change, I would do more
tests later. 


>