[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1a924e35-9e37-df19-3a3d-6296e7d91820@codeaurora.org>
Date: Tue, 4 Oct 2016 10:11:21 -0700
From: Joonwoo Park <joonwoop@...eaurora.org>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Ingo Molnar <mingo@...hat.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Steven Rostedt <rostedt@...dmis.org>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] perf sched: kill time stamp discrepancy between script
and latency
On 10/04/2016 05:04 AM, Namhyung Kim wrote:
> On Mon, Oct 03, 2016 at 03:04:48PM -0700, Joonwoo Park wrote:
>>
>>
>> On 09/30/2016 10:15 PM, Namhyung Kim wrote:
>>> Hi Joonwoo,
>>>
>>> On Wed, Sep 28, 2016 at 07:25:26PM -0700, Joonwoo Park wrote:
>>>> Perf sched latency is handy to find out the maximum sched latency and
>>>> the time stamp of the event. After running sched latency, if a found
>>>> latency looks suspicious it's quite reasonable to run perf script
>>>> subsequently and search with the time stamp given by perf sched latency
>>>> to continue further debugging. However, at present, it's possible the
>>>> time stamp given by perf sched latency cannot be found in the trace
>>>> output by perf script because perf sched latency converts the event
>>>> time from ns to ms as double float and prints it with printf which
>>>> does banker's rounding as opposed to perf script doesn't.
>>>>
>>>> For example:
>>>>
>>>> 0x750ff0 [0x80]: event: 9
>>>> <snip>
>>>> 2 1858303049520 0x750ff0 [0x80]: PERF_RECORD_SAMPLE(IP, 0x1): 15281/15281: 0xffffffff8162a63a period: 1 addr: 0
>>>> ... thread: hackbench:15281
>>>> <snip>
>>>>
>>>> $ perf sched -i perf.data latency | grep hackbench
>>>> hackbench:(401) + 3539.283 ms | 23347 | avg: 7.286 ms | max: 829.998 ms | max at: 1858.303050 s
>>>>
>>>> $ perf script -i perf.data | grep "1858\.303050"
>>>>
>>>> $ perf script -i perf.data | grep "1858\.303049"
>>>> hackbench 15281 [002] 1858.303049: sched:sched_switch: prev_comm=hackbench prev_pid=15281 prev_prio=120 prev_state=D ==> next_comm=hackbench next_pid=15603 next_prio=120
>>>>
>>>> Fix perf latency to print out time stamp without rounding to avoid such
>>>> discrepancy.
>>>>
>>>> Cc: Peter Zijlstra <peterz@...radead.org>
>>>> Cc: Ingo Molnar <mingo@...hat.com>
>>>> Cc: Arnaldo Carvalho de Melo <acme@...nel.org>
>>>> Cc: Alexander Shishkin <alexander.shishkin@...ux.intel.com>
>>>> Cc: Steven Rostedt <rostedt@...dmis.org>
>>>> Cc: Namhyung Kim <namhyung@...nel.org>
>>>> Cc: linux-kernel@...r.kernel.org
>>>> Signed-off-by: Joonwoo Park <joonwoop@...eaurora.org>
>>>> ---
>>>>
>>>> I was tempted to get rid of all u64 to double casting in the function
>>>> output_lat_thread but didn't because there is no data loss as of
>>>> today. Double float gives at least 15 significant decimal digits
>>>> precision while the function requires only 14 significant digits precision.
>>>>
>>>> $ python -c "print(len(str(int(0xffffffffffffffff / 1e6))))"
>>>> 14
>>>>
>>>> tools/lib/traceevent/event-parse.h | 1 +
>>>> tools/perf/builtin-sched.c | 12 ++++++++++--
>>>> 2 files changed, 11 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/tools/lib/traceevent/event-parse.h b/tools/lib/traceevent/event-parse.h
>>>> index 9ffde37..f42703c 100644
>>>> --- a/tools/lib/traceevent/event-parse.h
>>>> +++ b/tools/lib/traceevent/event-parse.h
>>>> @@ -174,6 +174,7 @@ struct pevent_plugin_option {
>>>>
>>>> #define NSECS_PER_SEC 1000000000ULL
>>>> #define NSECS_PER_USEC 1000ULL
>>>> +#define MSECS_PER_SEC 1000ULL
>>>>
>>>> enum format_flags {
>>>> FIELD_IS_ARRAY = 1,
>>>> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
>>>> index afa0576..e5cf51a 100644
>>>> --- a/tools/perf/builtin-sched.c
>>>> +++ b/tools/perf/builtin-sched.c
>>>> @@ -1190,6 +1190,7 @@ static void output_lat_thread(struct perf_sched *sched, struct work_atoms *work_
>>>> int i;
>>>> int ret;
>>>> u64 avg;
>>>> + u64 max_lat_at_sec, max_lat_at_msec;
>>>
>>> Isn't it usec rathen than msec? :)
>>
>> It's to contain three decimal digits which are msecs when 'max_lat_at' is
>> expressed in sec.
>> For example when max_lat_at = 1858303049520 which is 1858.3030495199998 sec,
>> max_lat_at_msec is meant to be 303.
>
> ??? But didn't you want to print it as 303049?
>
> Looking at the code, the 'max_lat' is a latency which was printed in
> msec (%9.3f ms) but the 'max_lat_at' is a timestamp which was printed
> in sec (%13.6f s). This is confusing but they use effectively same
> unit (usec) by using different number of digit after the period.
>
>>
>> I must admit variable's name is bit misleading. Maybe just secs, msecs are
>> better?
>> Also just noticed u64 isn't needed for msecs. Will size down.
>
> IIUC you wanted usec for timestamp rather than msec, aren't you?
Ugh. I was a dummy. My bad. Will fix this.
>
>>
>>>
>>>>
>>>> if (!work_list->nb_atoms)
>>>> return;
>>>> @@ -1212,11 +1213,18 @@ static void output_lat_thread(struct perf_sched *sched, struct work_atoms *work_
>>>>
>>>> avg = work_list->total_lat / work_list->nb_atoms;
>>>>
>>>> - printf("|%11.3f ms |%9" PRIu64 " | avg:%9.3f ms | max:%9.3f ms | max at: %13.6f s\n",
>>>> + /*
>>>> + * Avoid round up with printf to prevent event time discrepency
>>>> + * between sched script and latency.
>>>> + */
>>>> + max_lat_at_sec = work_list->max_lat_at / NSECS_PER_SEC;
>>>> + max_lat_at_msec = (work_list->max_lat_at -
>>>> + max_lat_at_sec * NSECS_PER_SEC) / MSECS_PER_SEC;
>>>> + printf("+%11.3f ms |%9" PRIu64 " | avg:%9.3f ms | max:%9.3f ms | max at: %6lu.%06lu s\n",
>>>
>>> Maybe you'd better to be in sync with the script code:
>>>
>>> if (PRINT_FIELD(TIME)) {
>>> nsecs = sample->time;
>>> secs = nsecs / NSECS_PER_SEC;
>>> nsecs -= secs * NSECS_PER_SEC;
>>> usecs = nsecs / NSECS_PER_USEC;
>>> if (nanosecs)
>>> printf("%5lu.%09llu: ", secs, nsecs);
>>> else
>>> printf("%5lu.%06lu: ", secs, usecs);
>>> }
>>
>> Apart from variable name, I'm not quite sure what to sync because sched
>> doesn't print in nsecs.
>> Maybe you just wanted for variable names in sync rather than logic?
>
> You found inconsistency between the two, and wanted to show same time.
> So wouldn't it be better factor out the code and reuse it from the
> both callsite? If you don't need nsec, you could make it an option.
>
> But this is just a general suggestion and maybe it's not worth doing
> here since the code is small.. So I won't insist on it strongly.
I see what you're suggesting. Will try to factor out the common part in
the next revision.
Thanks,
Joonwoo
>
> Thanks
> Namhyung
>
>>>
>>>
>>>> (double)work_list->total_runtime / 1e6,
>>>> work_list->nb_atoms, (double)avg / 1e6,
>>>> (double)work_list->max_lat / 1e6,
>>>> - (double)work_list->max_lat_at / 1e9);
>>>> + max_lat_at_sec, max_lat_at_msec);
>>>> }
>>>>
>>>> static int pid_cmp(struct work_atoms *l, struct work_atoms *r)
>>>> --
>>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
>>>> hosted by The Linux Foundation
>>>>
Powered by blists - more mailing lists