[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1a924e35-9e37-df19-3a3d-6296e7d91820@codeaurora.org>
Date:   Tue, 4 Oct 2016 10:11:21 -0700
From:   Joonwoo Park <joonwoop@...eaurora.org>
To:     Namhyung Kim <namhyung@...nel.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] perf sched: kill time stamp discrepancy between script
 and latency
On 10/04/2016 05:04 AM, Namhyung Kim wrote:
> On Mon, Oct 03, 2016 at 03:04:48PM -0700, Joonwoo Park wrote:
>>
>>
>> On 09/30/2016 10:15 PM, Namhyung Kim wrote:
>>> Hi Joonwoo,
>>>
>>> On Wed, Sep 28, 2016 at 07:25:26PM -0700, Joonwoo Park wrote:
>>>> Perf sched latency is handy to find out the maximum sched latency and
>>>> the time stamp of the event.  After running sched latency, if a found
>>>> latency looks suspicious it's quite reasonable to run perf script
>>>> subsequently and search with the time stamp given by perf sched latency
>>>> to continue further debugging.  However, at present, it's possible the
>>>> time stamp given by perf sched latency cannot be found in the trace
>>>> output by perf script because perf sched latency converts the event
>>>> time from ns to ms as double float and prints it with printf which
>>>> does banker's rounding as opposed to perf script doesn't.
>>>>
>>>>   For example:
>>>>
>>>>    0x750ff0 [0x80]: event: 9
>>>>    <snip>
>>>>    2 1858303049520 0x750ff0 [0x80]: PERF_RECORD_SAMPLE(IP, 0x1): 15281/15281: 0xffffffff8162a63a period: 1 addr: 0
>>>>     ... thread: hackbench:15281
>>>>    <snip>
>>>>
>>>>     $ perf sched -i perf.data latency | grep hackbench
>>>>       hackbench:(401)       +   3539.283 ms |    23347 | avg:    7.286 ms | max:  829.998 ms | max at:   1858.303050 s
>>>>
>>>>     $ perf script -i perf.data | grep "1858\.303050"
>>>>
>>>>     $ perf script -i perf.data | grep "1858\.303049"
>>>>       hackbench 15281 [002]  1858.303049:       sched:sched_switch: prev_comm=hackbench prev_pid=15281 prev_prio=120 prev_state=D ==> next_comm=hackbench next_pid=15603 next_prio=120
>>>>
>>>> Fix perf latency to print out time stamp without rounding to avoid such
>>>> discrepancy.
>>>>
>>>> Cc: Peter Zijlstra <peterz@...radead.org>
>>>> Cc: Ingo Molnar <mingo@...hat.com>
>>>> Cc: Arnaldo Carvalho de Melo <acme@...nel.org>
>>>> Cc: Alexander Shishkin <alexander.shishkin@...ux.intel.com>
>>>> Cc: Steven Rostedt <rostedt@...dmis.org>
>>>> Cc: Namhyung Kim <namhyung@...nel.org>
>>>> Cc: linux-kernel@...r.kernel.org
>>>> Signed-off-by: Joonwoo Park <joonwoop@...eaurora.org>
>>>> ---
>>>>
>>>> I was tempted to get rid of all u64 to double casting in the function
>>>> output_lat_thread but didn't because there is no data loss as of
>>>> today.  Double float gives at least 15 significant decimal digits
>>>> precision while the function requires only 14 significant digits precision.
>>>>
>>>> $ python -c "print(len(str(int(0xffffffffffffffff / 1e6))))"
>>>> 14
>>>>
>>>>  tools/lib/traceevent/event-parse.h |  1 +
>>>>  tools/perf/builtin-sched.c         | 12 ++++++++++--
>>>>  2 files changed, 11 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/tools/lib/traceevent/event-parse.h b/tools/lib/traceevent/event-parse.h
>>>> index 9ffde37..f42703c 100644
>>>> --- a/tools/lib/traceevent/event-parse.h
>>>> +++ b/tools/lib/traceevent/event-parse.h
>>>> @@ -174,6 +174,7 @@ struct pevent_plugin_option {
>>>>
>>>>  #define NSECS_PER_SEC		1000000000ULL
>>>>  #define NSECS_PER_USEC		1000ULL
>>>> +#define MSECS_PER_SEC		1000ULL
>>>>
>>>>  enum format_flags {
>>>>  	FIELD_IS_ARRAY		= 1,
>>>> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
>>>> index afa0576..e5cf51a 100644
>>>> --- a/tools/perf/builtin-sched.c
>>>> +++ b/tools/perf/builtin-sched.c
>>>> @@ -1190,6 +1190,7 @@ static void output_lat_thread(struct perf_sched *sched, struct work_atoms *work_
>>>>  	int i;
>>>>  	int ret;
>>>>  	u64 avg;
>>>> +	u64 max_lat_at_sec, max_lat_at_msec;
>>>
>>> Isn't it usec rathen than msec? :)
>>
>> It's to contain three decimal digits which are msecs when 'max_lat_at' is
>> expressed in sec.
>> For example when max_lat_at = 1858303049520 which is 1858.3030495199998 sec,
>> max_lat_at_msec is meant to be 303.
>
> ???  But didn't you want to print it as 303049?
>
> Looking at the code, the 'max_lat' is a latency which was printed in
> msec (%9.3f ms) but the 'max_lat_at' is a timestamp which was printed
> in sec (%13.6f s).  This is confusing but they use effectively same
> unit (usec) by using different number of digit after the period.
>
>>
>> I must admit variable's name is bit misleading.  Maybe just secs, msecs are
>> better?
>> Also just noticed u64 isn't needed for msecs.  Will size down.
>
> IIUC you wanted usec for timestamp rather than msec, aren't you?
Ugh.  I was a dummy.  My bad.  Will fix this.
>
>>
>>>
>>>>
>>>>  	if (!work_list->nb_atoms)
>>>>  		return;
>>>> @@ -1212,11 +1213,18 @@ static void output_lat_thread(struct perf_sched *sched, struct work_atoms *work_
>>>>
>>>>  	avg = work_list->total_lat / work_list->nb_atoms;
>>>>
>>>> -	printf("|%11.3f ms |%9" PRIu64 " | avg:%9.3f ms | max:%9.3f ms | max at: %13.6f s\n",
>>>> +	/*
>>>> +	 * Avoid round up with printf to prevent event time discrepency
>>>> +	 * between sched script and latency.
>>>> +	 */
>>>> +	max_lat_at_sec = work_list->max_lat_at / NSECS_PER_SEC;
>>>> +	max_lat_at_msec = (work_list->max_lat_at -
>>>> +			   max_lat_at_sec * NSECS_PER_SEC) / MSECS_PER_SEC;
>>>> +	printf("+%11.3f ms |%9" PRIu64 " | avg:%9.3f ms | max:%9.3f ms | max at: %6lu.%06lu s\n",
>>>
>>> Maybe you'd better to be in sync with the script code:
>>>
>>> 	if (PRINT_FIELD(TIME)) {
>>> 		nsecs = sample->time;
>>> 		secs = nsecs / NSECS_PER_SEC;
>>> 		nsecs -= secs * NSECS_PER_SEC;
>>> 		usecs = nsecs / NSECS_PER_USEC;
>>> 		if (nanosecs)
>>> 			printf("%5lu.%09llu: ", secs, nsecs);
>>> 		else
>>> 			printf("%5lu.%06lu: ", secs, usecs);
>>> 	}
>>
>> Apart from variable name, I'm not quite sure what to sync because sched
>> doesn't print in nsecs.
>> Maybe you just wanted for variable names in sync rather than logic?
>
> You found inconsistency between the two, and wanted to show same time.
> So wouldn't it be better factor out the code and reuse it from the
> both callsite?  If you don't need nsec, you could make it an option.
>
> But this is just a general suggestion and maybe it's not worth doing
> here since the code is small..  So I won't insist on it strongly.
I see what you're suggesting.  Will try to factor out the common part in 
the next revision.
Thanks,
Joonwoo
>
> Thanks
> Namhyung
>
>>>
>>>
>>>>  	      (double)work_list->total_runtime / 1e6,
>>>>  		 work_list->nb_atoms, (double)avg / 1e6,
>>>>  		 (double)work_list->max_lat / 1e6,
>>>> -		 (double)work_list->max_lat_at / 1e9);
>>>> +		 max_lat_at_sec, max_lat_at_msec);
>>>>  }
>>>>
>>>>  static int pid_cmp(struct work_atoms *l, struct work_atoms *r)
>>>> --
>>>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
>>>> hosted by The Linux Foundation
>>>>
Powered by blists - more mailing lists
 
