[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <06dc5615-a3ba-cc7b-d172-a13601ba4d4d@linux.intel.com>
Date: Thu, 3 Aug 2017 18:58:41 +0300
From: Alexey Budankov <alexey.budankov@...ux.intel.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Andi Kleen <ak@...ux.intel.com>,
Kan Liang <kan.liang@...el.com>,
Dmitri Prokhorov <Dmitry.Prohorov@...el.com>,
Valery Cherepennikov <valery.cherepennikov@...el.com>,
Mark Rutland <mark.rutland@....com>,
Stephane Eranian <eranian@...gle.com>,
David Carrillo-Cisneros <davidcc@...gle.com>,
linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v6 2/3]: perf/core: use context tstamp_data for skipped
events on mux interrupt
On 03.08.2017 17:00, Peter Zijlstra wrote:
> On Wed, Aug 02, 2017 at 11:15:39AM +0300, Alexey Budankov wrote:
>> +struct perf_event_tstamp {
>> + /*
>> + * These are timestamps used for computing total_time_enabled
>> + * and total_time_running when the event is in INACTIVE or
>> + * ACTIVE state, measured in nanoseconds from an arbitrary point
>> + * in time.
>> + * enabled: the notional time when the event was enabled
>> + * running: the notional time when the event was scheduled on
>> + * stopped: in INACTIVE state, the notional time when the
>> + * event was scheduled off.
>> + */
>> + u64 enabled;
>> + u64 running;
>> + u64 stopped;
>> +};
>
>
> So I have the below (untested) patch, also see:
>
> https://lkml.kernel.org/r/20170802171051.zlq5rgx3jqkkxpg7@hirez.programming.kicks-ass.net
>
> And I don't think I fully agree with your description of running.
I copied this comment from the previous place without any change.
> Despite its name tstamp_running is not in fact a time stamp afaict. Its
> more like an accumulator of running, but with an offset of stopped.
I see tstamp_running as something that needs to be subtracted from the timestamp
e.g. when update_context_time() is called to get correct event's total timings:
total_time_enabled = timestamp - enabled
total_time_running = timestamp - running
E.g. for the case with a single thread and a single event, running on a
dual-core machine during 10 ticks and half time on each core we have:
For the first core event instance:
10 = total_time_enabled = timestamp[110] - enabled[100]
5 = total_time_running = timestamp[110] - running[100 + 1 + 1 + 1 + 1 + 1]
"+ 1" above for every time event instance doesn't get thru perf_event_filter().
In particular when an event instance is for a CPU different from the one that
schedules the instance.
So 5/10 = 0.5 - 50% of time event running on the first core. The same is for the second core.
When we sum up instances times we get value for the user:
50%(first core) + 50%(second core) = 100% of event run time - no multiplexing case.
Without a thread migration we would have:
For the first core running thread:
10 = total_time_enabled = timestamp[110] - enabled[100]
10 = total_time_running = timestamp[110] - running[100]
10/10 = 1 - 100%
For the second core:
10 = total_time_enabled = timestamp[110] - enabled[100]
0 = total_time_running = timestamp[110] - running[100 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1]
0/10 = 0 - 0%
100% + 0% == 100% of event run time
>From this perspective tstamp_running field indeed accumulates some time
but is more like tstamp_eligible_to_run so:
total_time_running == elapsed - tstamp_eligible_to_run
>
> I'm always completely confused by the way this timekeeping is done.
>
> ---
> Subject: perf: Fix time on IOC_ENABLE
> From: Peter Zijlstra <peterz@...radead.org>
> Date: Thu Aug 3 15:42:09 CEST 2017
>
> Vince reported that when we do IOC_ENABLE/IOC_DISABLE while the task
> is SIGSTOP'ed state the timestamps go wobbly.
>
> It turns out we indeed fail to correctly account time while in 'OFF'
> state and doing IOC_ENABLE without getting scheduled in exposes the
> problem.
>
> Further thinking about this problem, it occurred to me that we can
> suffer a similar fate when we migrate an uncore event between CPUs.
> The perf_event_install() on the 'new' CPU will do add_event_to_ctx()
> which will reset all the time stamp, resulting in a subsequent
> update_event_times() to overwrite the total_time_* fields with smaller
> values.
>
> Reported-by: Vince Weaver <vincent.weaver@...ne.edu>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> ---
> kernel/events/core.c | 36 +++++++++++++++++++++++++++++++-----
> 1 file changed, 31 insertions(+), 5 deletions(-)
>
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2217,6 +2217,33 @@ static int group_can_go_on(struct perf_e
> return can_add_hw;
> }
>
> +/*
> + * Complement to update_event_times(). This computes the tstamp_* values to
> + * continue 'enabled' state from @now. And effectively discards the time
> + * between the prior tstamp_stopped and now (as we were in the OFF state, or
> + * just switched (context) time base).
> + *
> + * This further assumes '@...nt->state == INACTIVE' (we just came from OFF) and
> + * cannot have been scheduled in yet. And going into INACTIVE state means
> + * '@...nt->tstamp_stopped = @now'.
> + *
> + * Thus given the rules of update_event_times():
> + *
> + * total_time_enabled = tstamp_stopped - tstamp_enabled
> + * total_time_running = tstamp_stopped - tstamp_running
> + *
> + * We can insert 'tstamp_stopped == now' and reverse them to compute new
> + * tstamp_* values.
> + */
> +static void __perf_event_enable_time(struct perf_event *event, u64 now)
> +{
> + WARN_ON_ONCE(event->state != PERF_EVENT_STATE_INACTIVE);
> +
> + event->tstamp_stopped = now;
> + event->tstamp_enabled = now - event->total_time_enabled;
> + event->tstamp_running = now - event->total_time_running;
> +}
> +
> static void add_event_to_ctx(struct perf_event *event,
> struct perf_event_context *ctx)
> {
> @@ -2224,9 +2251,7 @@ static void add_event_to_ctx(struct perf
>
> list_add_event(event, ctx);
> perf_group_attach(event);
> - event->tstamp_enabled = tstamp;
> - event->tstamp_running = tstamp;
> - event->tstamp_stopped = tstamp;
> + __perf_event_enable_time(event, tstamp);
> }
>
> static void ctx_sched_out(struct perf_event_context *ctx,
> @@ -2471,10 +2496,11 @@ static void __perf_event_mark_enabled(st
> u64 tstamp = perf_event_time(event);
>
> event->state = PERF_EVENT_STATE_INACTIVE;
> - event->tstamp_enabled = tstamp - event->total_time_enabled;
> + __perf_event_enable_time(event, tstamp);
> list_for_each_entry(sub, &event->sibling_list, group_entry) {
> + /* XXX should not be > INACTIVE if event isn't */
> if (sub->state >= PERF_EVENT_STATE_INACTIVE)
> - sub->tstamp_enabled = tstamp - sub->total_time_enabled;
> + __perf_event_enable_time(sub, tstamp);
> }
> }
>
>
Powered by blists - more mailing lists