lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170802171051.zlq5rgx3jqkkxpg7@hirez.programming.kicks-ass.net>
Date:   Wed, 2 Aug 2017 19:10:51 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Vince Weaver <vincent.weaver@...ne.edu>
Cc:     linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Stephane Eranian <eranian@...il.com>
Subject: Re: [patch] perf wrong enabled time after attach/enable/enable

On Tue, Jul 25, 2017 at 09:47:52PM -0400, Vince Weaver wrote:

> The problem happens with the following:
> 	fork a child, put it to sleep with SIGSTOP
> 	perf_event_open(), attaching to the pid of the child
> 	ENABLE/DISABLE/ENABLE/DISABLE/ENABLE
> 	read the results with the enabled/running multiplexing fields
> 
> Your "enabled" time will be a huge negative number, which will confuse any
> code depening on it (especially the rdpmc scaling code).
> 
> From what I can tell, when you DISABLE an event that's in a 
> PERF_EVENT_STATE_INACTIVE state, currently it exits early without
> updating event->tstamp_stopped.  So later when calculating the
> enabled time it is wrong and you get the negative number.
> The following quick hack fixes things, it just abuses the existing code
> that was there to handle a similar case when a filter failed.  The proper
> fix might be to just put the event->tstamp_stopped update before the
> early out code, but the commit log for the change that introduced that
> makes it sound a bit scary.

fa66f07aa1f0 ("perf_events: Fix time tracking for events with pid != -1 and cpu != -1")

is the commit that introduced it (the scary one just moved the && to the
right place).

> Anyway a reproducible test case for this is in my pref_event_tests tree,
> the "tests/rdpmc/rdpmc_attach_multi_enable" test.
> 
> Vince
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 426c2ffba16d..18eadc3b37d5 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -1812,8 +1812,7 @@ event_sched_out(struct perf_event *event,
>  	 * maintained, otherwise bogus information is return
>  	 * via read() for time_enabled, time_running:
>  	 */
> -	if (event->state == PERF_EVENT_STATE_INACTIVE &&
> -	    !event_filter_match(event)) {
> +	if (event->state == PERF_EVENT_STATE_INACTIVE) {
>  		delta = tstamp - event->tstamp_stopped;
>  		event->tstamp_running += delta;
>  		event->tstamp_stopped = tstamp;

It doesn't look entirely crazy.. just trying to understand why Stephane
added that filter clause.

Argh, that time accounting always hurts my brain, its so weird. I never
dared rewrite it for risk of breaking it all again, but maybe, just
maybe we should have.. :/


init(now):
 enabled = stopped = running = now;

in(now):
 running += now - stopped;

out(now):
 stopped = now;

read:
 *enabled = stopped - enabled;
 *running = stopped - running;


Simple example:

init(10):
 e = s = r = 10

in(11):
 r += 11 - 10; r=11

out(12):
 s = 12

read:
 *e = 12-10 = 2
 *r = 12-11 = 1

in(14):
 r += 14 - 12; r=13

out(16):
 s = 16

read:
 *e = 16 - 10 = 6
 *r = 16 - 13 = 3


which confusingly works :-)

So lets break that by out not updating stopped, then we get:


init(10):
 e = s = r = 10

in(11):
 r += 11 - 10; r=11

read:
 *e = 10-10 = 2
 *r = 10-11 = -1

in(14):
 r += 14 - 10; r=15

read:
 *e = 10 - 10 = 0
 *r = 10 - 15 = -5


That is negative for 'running', not enabled. But your test really does
report a negative 'enabled'.

Playing with that test it really is the IOC_DISABLE while STOP'ed that
messes things up.

Ah.. magic.. the below seems to fix things, hopefully it doesn't break
anything else.

---
 kernel/events/core.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 426c2ffba16d..5dbbd1f90b4f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2458,6 +2458,20 @@ perf_install_in_context(struct perf_event_context *ctx,
 }
 
 /*
+ * reverse update_event_times()
+ */
+void __perf_event_enable_time(struct perf_event *event, u64 tstamp)
+{
+	/*
+	 * total_time_enabled = tstamp_stopped - tstamp_enabled
+	 * total_time_running = tstamp_stopped - tstamp_running;
+	 */
+	event->tstamp_stopped = tstamp;
+	event->tstamp_enabled = tstamp - event->total_time_enabled;
+	event->tstamp_running = tstamp - event->total_time_running;
+}
+
+/*
  * Put a event into inactive state and update time fields.
  * Enabling the leader of a group effectively enables all
  * the group members that aren't explicitly disabled, so we
@@ -2471,10 +2485,11 @@ static void __perf_event_mark_enabled(struct perf_event *event)
 	u64 tstamp = perf_event_time(event);
 
 	event->state = PERF_EVENT_STATE_INACTIVE;
-	event->tstamp_enabled = tstamp - event->total_time_enabled;
+	__perf_event_enable_time(event, tstamp);
+
 	list_for_each_entry(sub, &event->sibling_list, group_entry) {
 		if (sub->state >= PERF_EVENT_STATE_INACTIVE)
-			sub->tstamp_enabled = tstamp - sub->total_time_enabled;
+			__perf_event_enable_time(sub, tstamp);
 	}
 }
 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ