linux-kernel - Re: [PATCH V9 3/3] perf/x86/intel: Support PEBS counters snapshotting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <be90e73b-eedf-4ff0-a832-b579e003cac9@linux.intel.com>
Date: Thu, 16 Jan 2025 10:55:46 -0500
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...hat.com, acme@...nel.org, namhyung@...nel.org,
 irogers@...gle.com, adrian.hunter@...el.com, linux-kernel@...r.kernel.org,
 linux-perf-users@...r.kernel.org, ak@...ux.intel.com, eranian@...gle.com,
 dapeng1.mi@...ux.intel.com
Subject: Re: [PATCH V9 3/3] perf/x86/intel: Support PEBS counters snapshotting



On 2025-01-16 6:47 a.m., Peter Zijlstra wrote:
> On Wed, Jan 15, 2025 at 10:43:18AM -0800, kan.liang@...ux.intel.com wrote:
> 
>> Reviewed-by: Andi Kleen <ak@...ux.intel.com>
>> Reviewed-by: Ian Rogers <irogers@...gle.com>
> 
> Did they really read all the various versions without spotting any of
> the problems, or are you just rubber stamping things at this point :/
> 
> Best to drop review tags after serious changes to patches.
> 

Sure.

>> Suggested-by: Peter Zijlstra (Intel) <peterz@...radead.org>
>> Signed-off-by: Kan Liang <kan.liang@...ux.intel.com>
>> ---
> 
>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>> index 8f218ac0d445..79a4aad5a0a3 100644
>> --- a/arch/x86/events/core.c
>> +++ b/arch/x86/events/core.c
>> @@ -94,6 +94,8 @@ DEFINE_STATIC_CALL_NULL(x86_pmu_pebs_aliases, *x86_pmu.pebs_aliases);
>>  
>>  DEFINE_STATIC_CALL_NULL(x86_pmu_filter, *x86_pmu.filter);
>>  
>> +DEFINE_STATIC_CALL_NULL(x86_pmu_late_config, x86_pmu_late_config);
> 
> So all these static call are through struct x86_pmu (hence the naming),
> but not this one ?!?

I will add it in struct x86_pmu.

> 
>>  /*
>>   * This one is magic, it will get called even when PMU init fails (because
>>   * there is no PMU), in which case it should simply return NULL.
>> @@ -1329,7 +1331,16 @@ static void x86_pmu_enable(struct pmu *pmu)
>>  		}
>>  
>>  		/*
>> -		 * step2: reprogram moved events into new counters
>> +		 * step2:
>> +		 * The late config (after counters are scheduled)
>> +		 * is required for some cases, e.g., PEBS counters
>> +		 * snapshotting. Because an accurate counter index
>> +		 * is needed.
>> +		 */
>> +		static_call_cond(x86_pmu_late_config)();
>> +
>> +		/*
>> +		 * step3: reprogram moved events into new counters
>>  		 */
>>  		for (i = 0; i < cpuc->n_events; i++) {
>>  			event = cpuc->event_list[i];
> 
> Placement and naming are weird.
> 
> These 2 loops migrate all counters that got a new place, the first loop
> taking out pre-existing counters that got a new place, the second loop
> setting up these same and new counters.
> 
> Why do the pebs config in the middle, where the least number of counters
> are set-up?
> 
> AFAICT all it really needs is cpuc->assignp[], which is unchanged
> throughout. You can call this at any time before clearing n_added,
> sticking it in the middle just seems weird.

I just tried to put it right before the x86_pmu_start(RELOAD), since the
MSR will be updated in the x86_pmu_start().
But yes, any place before it is good.

> 
> Then naming, config is typically what we do at event creation,
> x86_pmu_hw_config() like. This is not at all related, so perhaps pick a
> different name?

How about late_setup/extra_setup?

@@ -1300,6 +1300,9 @@ static void x86_pmu_enable(struct pmu *pmu)

 	if (cpuc->n_added) {
 		int n_running = cpuc->n_events - cpuc->n_added;
+
+		static_call_cond(x86_pmu_late_setup)();
+
 		/*
 		 * apply assignment obtained either from
 		 * hw_perf_group_sched_in() or x86_pmu_enable()


It can be put at the begin of the cpuc->n_added.

So perf does the late setup, and then move and start the pre-existing
and new events. Is it OK?

> 
> Bah, we're so very close between x86_pmu::{enable, disable, assign}() :/
> 
> Also, I think I found you another bug... Consider what happens to the
> counter value when we reschedule a HES_STOPPED counter, then we skip
> x86_pmu_start(RELOAD) on step2, which leave the counter value with
> 'random' crap from whatever was there last.
> 
> But meanwhile you do program PEBS to sample it. That will happily sample
> this garbage.
> 
> Hmm?

I'm not quite sure I understand the issue.

The HES_STOPPED counter should be a pre-existing counter. Just for some
reason, it's stopped, right? So perf doesn't need to re-configure the
PEBS__DATA_CFG, since the idx is not changed.

The worry is that the HES_STOPPED counter is restarted with
x86_pmu_start(0), right?

There should be 3 places which invoking the x86_pmu_start(0).
- In __perf_event_stop(). But according to the comments, it's only for
the events with AUX ring buffer. It should not be a problem for PEBS.
- Handle throttle. The event does stop(0) and then start(0). There
should be no 'random' garbage, since the counter has been stopped.
Everything should be just the same as when it's stopped.
- In the perf_adjust_freq_unthr_events(), when the period is not
adjusted, there should be no 'random' garbage as well.

Or do you mean that some 3rd party tools change the counter values while
it's stopped?

If so, I think we may force update for the pebs-counter-event-group as
below.

@@ -1517,7 +1522,8 @@ static void x86_pmu_start(struct perf_event
*event, int flags)
 	if (WARN_ON_ONCE(idx == -1))
 		return;

-	if (flags & PERF_EF_RELOAD) {
+	if (flags & PERF_EF_RELOAD ||
+		is_pebs_counter_event_group(event)) {
 		WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE));
  		static_call(x86_pmu_set_period)(event);
 	}
@@ -1608,7 +1614,8 @@ void x86_pmu_stop(struct perf_event *event, int flags)
 		hwc->state |= PERF_HES_STOPPED;
 	}

-	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+	if (((flags & PERF_EF_UPDATE) || is_pebs_counter_event_group(event)) &&
+	    !(hwc->state & PERF_HES_UPTODATE)) {
 		/*
 		 * Drain the remaining delta count out of a event
 		 * that we are disabling:


Thanks,
Kan