linux-kernel - Re: perf: wrong event->count report (Was: perf basic-test-aarch64 failures)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160217102555.GZ6357@twins.programming.kicks-ass.net>
Date:	Wed, 17 Feb 2016 11:25:55 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Jiri Olsa <jolsa@...hat.com>
Cc:	Oleg Nesterov <oleg@...hat.com>,
	Pratyush Anand <panand@...hat.com>,
	Jeff Bastian <jbastian@...hat.com>,
	Michael Petlan <mpetlan@...hat.com>,
	linux-kernel@...r.kernel.org, arm-devel@...hat.com
Subject: Re: perf: wrong event->count report (Was: perf basic-test-aarch64
 failures)

On Wed, Feb 17, 2016 at 10:35:39AM +0100, Jiri Olsa wrote:
> On Wed, Feb 17, 2016 at 04:34:16AM +0100, Oleg Nesterov wrote:
> > Finally I reproduced... let me add CC's and reply to initial message. This has
> > nothing to do with arm/uprobes.
> > 
> > I simply can't understand how perf calculates ->total_time_enabled/running.
> > At all. But the problem is that
> > 
> > 	1. perf_event_enable_on_exec() does enable first, then event_sched_in().
> > 
> > 	   After that tstamp_enabled < tstamp_running
> > 
> > 	2. This means that after the next update_event_times()
> >            total_time_running < total_time_enabled
> > 
> >            again, I fail to understand these calculations, but this is what
> >            perf_event_read_value() reports to user-space.
> > 
> > 	3. /usr/bin/perf calls perf_counts_values__scale() which does
> > 
> > 		else if (count->run < count->ena) {
> > 			scaled = 1;
> > 			count->val = (u64)((double) count->val * count->ena / count->run + 0.5);
> > 		}
> > 
> > 	   and this is why you see the wrong number. count->val was correct but
> > 	   wrongly updated because  total_time_running < total_time_enabled.
> > 
> > I leave this to Peter and Jiri ;)
> 
> I did not notice the ther conversation wasn't public, reposting ;-)
> 
> jirka
> 
> ---
> ouch, I tested with fedora kernel.. I can reproduce with 4.5
> 
> Pratyush bisected this into following commit:
> [3e349507d12de93b08b0aa814fc2aa0dee91c5ba] perf: Fix perf_enable_on_exec() event scheduling
> 
> it seems the commit above introduced unwanted difference between
> counter's enabled and running times.. I'm checking on that ;-)

Does something like so work?

---

So prior to 3e349507d12d ("perf: Fix perf_enable_on_exec() event
scheduling") we used to call task_ctx_sched_out() before
event_enable_on_exec().

ctx_sched_out() will call update_context_time(), therefore
__perf_event_mark_enabled() would have an up-to-date ctx->time.

Now, not so much. So explicitly update the ctx time before calling
event_enable_on_exec().

ctx_resched() will again call update_context_time(), resulting in a
slight difference the other way (running > enabled), which doesn't make
any sense either, but that we can (and should) clip.


--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3173,6 +3173,10 @@ static void perf_event_enable_on_exec(in
 
 	cpuctx = __get_cpu_context(ctx);
 	perf_ctx_lock(cpuctx, ctx);
+
+	update_context_time(ctx);
+	update_cgrp_time_from_cpuctx(cpuctx);
+
 	list_for_each_entry(event, &ctx->event_list, event_entry)
 		enabled |= event_enable_on_exec(event, ctx);