linux-kernel - Re: [RESEND PATCH V2 1/4] perf/x86/intel: fix event update for auto-reload

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180124122618.GH2249@hirez.programming.kicks-ass.net>
Date:   Wed, 24 Jan 2018 13:26:18 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     kan.liang@...el.com
Cc:     mingo@...hat.com, acme@...nel.org, linux-kernel@...r.kernel.org,
        tglx@...utronix.de, jolsa@...hat.com, eranian@...gle.com,
        ak@...ux.intel.com, Kan Liang <kan.liang@...ux.intel.com>
Subject: Re: [RESEND PATCH V2 1/4] perf/x86/intel: fix event update for
 auto-reload

On Mon, Jan 08, 2018 at 07:15:13AM -0800, kan.liang@...el.com wrote:

> The formula to calculate the event->count is as below:

>   event->count = period left from last time +
>                  (reload_times - 1) * reload_val +
>                  latency of PMI handler
> 
> prev_count is the last observed hardware counter value. Just the same as
> non-auto-reload, its absolute value is the period of the first record.
> It should not update with each reload. Because it doesn't 'observe' the
> hardware counter for each auto-reload.
> 
> For the second and later records, the period is exactly the reload
> value. Just need to simply add (reload_times - 1) * reload_val to
> event->count.
> 
> The calculation of the latency of PMI handler is a little bit different
> as non-auto-reload. Because the start point is -reload_value. It needs
> to be adjusted by adding reload_value.
> The period_left needs to do the same adjustment.

What's this about the PMI latency, we don't care about that in any other
situation, right? Sure the PMI takes a bit of time, but we're not
correcting for that anywhere, so why start now?

> There is nothing need to do in x86_perf_event_set_period(). Because it
> is fixed period. The period_left is already adjusted.

Fixes tag is missing.

> Signed-off-by: Kan Liang <kan.liang@...ux.intel.com>
> ---
>  arch/x86/events/intel/ds.c | 69 ++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 67 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index 3674a4b..cc1f373 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -1251,17 +1251,82 @@ get_next_pebs_record_by_bit(void *base, void *top, int bit)
>  	return NULL;
>  }
>  
> +/*
> + * Specific intel_pmu_save_and_restart() for auto-reload.
> + */
> +static int intel_pmu_save_and_restart_reload(struct perf_event *event,
> +					     u64 reload_val,
> +					     int reload_times)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	int shift = 64 - x86_pmu.cntval_bits;
> +	u64 prev_raw_count, new_raw_count;
> +	u64 delta;
> +
> +	if ((reload_times == 0) || (reload_val == 0))
> +		return intel_pmu_save_and_restart(event);

Like Jiri, I find this confusing at best. If we need to call that one,
you shouldn't have called this function to begin with.

At best, have a WARN here or something.

> +
> +	/*
> +	 * Careful: an NMI might modify the previous event value.
> +	 *
> +	 * Our tactic to handle this is to first atomically read and
> +	 * exchange a new raw count - then add that new-prev delta
> +	 * count to the generic event atomically:
> +	 */

For now this seems to only get called from *drain_pebs* which afaict
only happens when we've disabled the PMU (either from sched_task or
PMI).

Now, you want to put this in the pmu::read() path, and that does not
disable the PMU, but I don't think we can drain the PEBS buffer while
its active, that's too full of races, so even there you'll have to
disable stuff.

So I don't think this is correct/desired for this case.

> +again:
> +	prev_raw_count = local64_read(&hwc->prev_count);
> +	rdpmcl(hwc->event_base_rdpmc, new_raw_count);
> +
> +	if (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
> +					new_raw_count) != prev_raw_count)
> +		goto again;
> +
> +	/*
> +	 * Now we have the new raw value and have updated the prev
> +	 * timestamp already. We can now calculate the elapsed delta
> +	 * (event-)time and add that to the generic event.
> +	 *
> +	 * Careful, not all hw sign-extends above the physical width
> +	 * of the count.
> +	 *
> +	 *   event->count = period left from last time +
> +	 *                  (reload_times - 1) * reload_val +
> +	 *                  latency of PMI handler
         *
> +	 * The period left from last time can be got from -prev_count.
> +	 * The start points of counting is always -reload_val.
> +	 * So the real latency of PMI handler is reload_val + new_raw_count.
> +	 */

That is very confused, the PMI latency is utterly unrelated to anything
you do here.

> +	delta = (reload_val << shift) + (new_raw_count << shift) -
> +		(prev_raw_count << shift);
> +	delta >>= shift;
> +
> +	local64_add(reload_val * (reload_times - 1), &event->count);
> +	local64_add(delta, &event->count);

And this is still wrong I think. Consider the case where !reload_times.

We can easily call pmu::read() twice in one period. In that case we
should increment count with (new - prev).

Only once we get a new sample and are known to have wrapped, do we need
to consider that wrap.

> +	local64_sub(delta, &hwc->period_left);
> +
> +	return x86_perf_event_set_period(event);
> +}
> +
>  static void __intel_pmu_pebs_event(struct perf_event *event,
>  				   struct pt_regs *iregs,
>  				   void *base, void *top,
>  				   int bit, int count)
>  {
> +	struct hw_perf_event *hwc = &event->hw;
>  	struct perf_sample_data data;
>  	struct pt_regs regs;
>  	void *at = get_next_pebs_record_by_bit(base, top, bit);
>  
> -	if (!intel_pmu_save_and_restart(event) &&
> -	    !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD))
> +	if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) {
> +		/*
> +		 * Now, auto-reload is only enabled in fixed period mode.
> +		 * The reload value is always hwc->sample_period.
> +		 * May need to change it, if auto-reload is enabled in
> +		 * freq mode later.
> +		 */
> +		intel_pmu_save_and_restart_reload(event, hwc->sample_period,
> +						  count);

Since you pass in @event, hwc->sample_period is already available to it,
no need to pass that in as well.

> +	} else if (!intel_pmu_save_and_restart(event))
>  		return;
>  
>  	while (count > 1) {