linux-kernel - Re: [PATCH V4 1/5] perf/x86: Extend event update interface

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240801140340.GF37996@noisy.programming.kicks-ass.net>
Date: Thu, 1 Aug 2024 16:03:40 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: kan.liang@...ux.intel.com
Cc: mingo@...nel.org, acme@...nel.org, namhyung@...nel.org,
	irogers@...gle.com, adrian.hunter@...el.com,
	alexander.shishkin@...ux.intel.com, linux-kernel@...r.kernel.org,
	ak@...ux.intel.com, eranian@...gle.com,
	Sandipan Das <sandipan.das@....com>,
	Ravi Bangoria <ravi.bangoria@....com>,
	silviazhao <silviazhao-oc@...oxin.com>
Subject: Re: [PATCH V4 1/5] perf/x86: Extend event update interface

On Wed, Jul 31, 2024 at 07:38:31AM -0700, kan.liang@...ux.intel.com wrote:
> From: Kan Liang <kan.liang@...ux.intel.com>
> 
> The current event update interface directly reads the values from the
> counter, but the values may not be the accurate ones users require. For
> example, the sample read feature wants the counter value of the member
> events when the leader event is overflow. But with the current
> implementation, the read (event update) actually happens in the NMI
> handler. There may be a small gap between the overflow and the NMI
> handler.

This...

> The new Intel PEBS counters snapshotting feature can provide
> the accurate counter value in the overflow. The event update interface
> has to be updated to apply the given accurate values.
> 
> Pass the accurate values via the event update interface. If the value is
> not available, still directly read the counter.

> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 12f2a0c14d33..07a56bf71160 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -112,7 +112,7 @@ u64 __read_mostly hw_cache_extra_regs
>   * Can only be executed on the CPU where the event is active.
>   * Returns the delta events processed.
>   */
> -u64 x86_perf_event_update(struct perf_event *event)
> +u64 x86_perf_event_update(struct perf_event *event, u64 *val)
>  {
>  	struct hw_perf_event *hwc = &event->hw;
>  	int shift = 64 - x86_pmu.cntval_bits;
> @@ -131,7 +131,10 @@ u64 x86_perf_event_update(struct perf_event *event)
>  	 */
>  	prev_raw_count = local64_read(&hwc->prev_count);
>  	do {
> -		rdpmcl(hwc->event_base_rdpmc, new_raw_count);
> +		if (!val)
> +			rdpmcl(hwc->event_base_rdpmc, new_raw_count);
> +		else
> +			new_raw_count = *val;
>  	} while (!local64_try_cmpxchg(&hwc->prev_count,
>  				      &prev_raw_count, new_raw_count));
>  

Does that mean the following is possible?

Two counters: C0 and C1, where C0 is a PEBS counter that also samples
C1.

  C0: overflow-with-PEBS-assist -> PEBS entry with counter value A
      (DS buffer threshold not reached)

  C1: overflow -> PMI -> x86_perf_event_update(C1, NULL)
      rdpmcl reads value 'A+d', and sets prev_raw_count

  C0: more assists, hit DS threshold -> PMI
      PEBS processing does x86_perf_event_update(C1, A)
      and sets prev_raw_count *backwards*

How is that sane?