[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170802171051.zlq5rgx3jqkkxpg7@hirez.programming.kicks-ass.net>
Date: Wed, 2 Aug 2017 19:10:51 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Vince Weaver <vincent.weaver@...ne.edu>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Stephane Eranian <eranian@...il.com>
Subject: Re: [patch] perf wrong enabled time after attach/enable/enable
On Tue, Jul 25, 2017 at 09:47:52PM -0400, Vince Weaver wrote:
> The problem happens with the following:
> fork a child, put it to sleep with SIGSTOP
> perf_event_open(), attaching to the pid of the child
> ENABLE/DISABLE/ENABLE/DISABLE/ENABLE
> read the results with the enabled/running multiplexing fields
>
> Your "enabled" time will be a huge negative number, which will confuse any
> code depening on it (especially the rdpmc scaling code).
>
> From what I can tell, when you DISABLE an event that's in a
> PERF_EVENT_STATE_INACTIVE state, currently it exits early without
> updating event->tstamp_stopped. So later when calculating the
> enabled time it is wrong and you get the negative number.
> The following quick hack fixes things, it just abuses the existing code
> that was there to handle a similar case when a filter failed. The proper
> fix might be to just put the event->tstamp_stopped update before the
> early out code, but the commit log for the change that introduced that
> makes it sound a bit scary.
fa66f07aa1f0 ("perf_events: Fix time tracking for events with pid != -1 and cpu != -1")
is the commit that introduced it (the scary one just moved the && to the
right place).
> Anyway a reproducible test case for this is in my pref_event_tests tree,
> the "tests/rdpmc/rdpmc_attach_multi_enable" test.
>
> Vince
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 426c2ffba16d..18eadc3b37d5 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -1812,8 +1812,7 @@ event_sched_out(struct perf_event *event,
> * maintained, otherwise bogus information is return
> * via read() for time_enabled, time_running:
> */
> - if (event->state == PERF_EVENT_STATE_INACTIVE &&
> - !event_filter_match(event)) {
> + if (event->state == PERF_EVENT_STATE_INACTIVE) {
> delta = tstamp - event->tstamp_stopped;
> event->tstamp_running += delta;
> event->tstamp_stopped = tstamp;
It doesn't look entirely crazy.. just trying to understand why Stephane
added that filter clause.
Argh, that time accounting always hurts my brain, its so weird. I never
dared rewrite it for risk of breaking it all again, but maybe, just
maybe we should have.. :/
init(now):
enabled = stopped = running = now;
in(now):
running += now - stopped;
out(now):
stopped = now;
read:
*enabled = stopped - enabled;
*running = stopped - running;
Simple example:
init(10):
e = s = r = 10
in(11):
r += 11 - 10; r=11
out(12):
s = 12
read:
*e = 12-10 = 2
*r = 12-11 = 1
in(14):
r += 14 - 12; r=13
out(16):
s = 16
read:
*e = 16 - 10 = 6
*r = 16 - 13 = 3
which confusingly works :-)
So lets break that by out not updating stopped, then we get:
init(10):
e = s = r = 10
in(11):
r += 11 - 10; r=11
read:
*e = 10-10 = 2
*r = 10-11 = -1
in(14):
r += 14 - 10; r=15
read:
*e = 10 - 10 = 0
*r = 10 - 15 = -5
That is negative for 'running', not enabled. But your test really does
report a negative 'enabled'.
Playing with that test it really is the IOC_DISABLE while STOP'ed that
messes things up.
Ah.. magic.. the below seems to fix things, hopefully it doesn't break
anything else.
---
kernel/events/core.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 426c2ffba16d..5dbbd1f90b4f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2458,6 +2458,20 @@ perf_install_in_context(struct perf_event_context *ctx,
}
/*
+ * reverse update_event_times()
+ */
+void __perf_event_enable_time(struct perf_event *event, u64 tstamp)
+{
+ /*
+ * total_time_enabled = tstamp_stopped - tstamp_enabled
+ * total_time_running = tstamp_stopped - tstamp_running;
+ */
+ event->tstamp_stopped = tstamp;
+ event->tstamp_enabled = tstamp - event->total_time_enabled;
+ event->tstamp_running = tstamp - event->total_time_running;
+}
+
+/*
* Put a event into inactive state and update time fields.
* Enabling the leader of a group effectively enables all
* the group members that aren't explicitly disabled, so we
@@ -2471,10 +2485,11 @@ static void __perf_event_mark_enabled(struct perf_event *event)
u64 tstamp = perf_event_time(event);
event->state = PERF_EVENT_STATE_INACTIVE;
- event->tstamp_enabled = tstamp - event->total_time_enabled;
+ __perf_event_enable_time(event, tstamp);
+
list_for_each_entry(sub, &event->sibling_list, group_entry) {
if (sub->state >= PERF_EVENT_STATE_INACTIVE)
- sub->tstamp_enabled = tstamp - sub->total_time_enabled;
+ __perf_event_enable_time(sub, tstamp);
}
}
Powered by blists - more mailing lists