[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4E3AB66E.7070201@gmail.com>
Date: Thu, 04 Aug 2011 09:10:38 -0600
From: David Ahern <dsahern@...il.com>
To: Frederic Weisbecker <fweisbec@...il.com>,
Ingo Molnar <mingo@...e.hu>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Arnaldo Carvalho de Melo <acme@...hat.com>
CC: linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
paulus@...ba.org, tglx@...utronix.de
Subject: Re: [PATCH 3/6] perf: add reference time event
On 07/12/2011 08:30 AM, Frederic Weisbecker wrote:
> On Sun, Jul 10, 2011 at 10:20:29PM -0600, David Ahern wrote:
>> On 06/17/2011 08:17 AM, Frederic Weisbecker wrote:
>>> On Fri, Jun 17, 2011 at 08:04:59AM -0600, David Ahern wrote:
>>>>
>>>>
>>>> On 06/17/2011 07:32 AM, Frederic Weisbecker wrote:
>>>>> On Tue, Jun 07, 2011 at 05:55:46PM -0600, David Ahern wrote:
>>>>>> For initial perf_clock to time-of-day correlation.
>>>>>>
>>>>>> Signed-off-by: David Ahern <dsahern@...il.com>
>>>>>> ---
>>>>>> tools/perf/util/event.c | 1 +
>>>>>> tools/perf/util/event.h | 8 ++++++++
>>>>>> tools/perf/util/session.c | 4 ++++
>>>>>> tools/perf/util/session.h | 3 ++-
>>>>>> 4 files changed, 15 insertions(+), 1 deletions(-)
>>>>>>
>>>>>> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
>>>>>> index 3c1b8a6..1a89a04 100644
>>>>>> --- a/tools/perf/util/event.c
>>>>>> +++ b/tools/perf/util/event.c
>>>>>> @@ -24,6 +24,7 @@ static const char *perf_event__names[] = {
>>>>>> [PERF_RECORD_HEADER_TRACING_DATA] = "TRACING_DATA",
>>>>>> [PERF_RECORD_HEADER_BUILD_ID] = "BUILD_ID",
>>>>>> [PERF_RECORD_FINISHED_ROUND] = "FINISHED_ROUND",
>>>>>> + [PERF_RECORD_REFTIME] = "REF_TIME",
>>>>>> };
>>>>>>
>>>>>> const char *perf_event__name(unsigned int id)
>>>>>> diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
>>>>>> index 1d7f664..f481f90 100644
>>>>>> --- a/tools/perf/util/event.h
>>>>>> +++ b/tools/perf/util/event.h
>>>>>> @@ -98,6 +98,7 @@ enum perf_user_event_type { /* above any possible kernel type */
>>>>>> PERF_RECORD_HEADER_TRACING_DATA = 66,
>>>>>> PERF_RECORD_HEADER_BUILD_ID = 67,
>>>>>> PERF_RECORD_FINISHED_ROUND = 68,
>>>>>> + PERF_RECORD_REFTIME = 69,
>>>>>
>>>>> We would like to avoid adding more custom events like these. They were very convenient
>>>>> but they steal the kernel event type space. They are deemed for removal in the long term.
>>>>>
>>>>> Another idea to achieve what you want would be to create a new perf event header feature,
>>>>> like HEADER_TRACE_INFO or HEADER_BUILD_ID are. Then use that to create a space in the perf
>>>>> file to save that couple of clocks initial values.
>>>>
>>>> you mean like this:
>>>> https://lkml.org/lkml/2010/12/7/813
>>>>
>>>> David
>>>
>>> Exactly, why did you change?
>>
>> Finally getting back to this.
>>
>> The answer to the 'why' is that putting a reference timestamp in the
>> header field does not work for file appends across reboots. ie., the case:
>> perf record --tod ...
>> reboot
>> perf record -A --tod ...
>
> Damn append mode. I doubt that thing is really used. And it just complexifies
> everything. It might be wise to get rid of it?
>
> Ingo, Peter, Arnaldo?
>
>> perf_clock timestamps change across reboots so the reference time
>> created by the first invocation is not valid for the append case. The
>> discussion then drifted towards having a kernel side event which per
>> past patch sets has its own issues.
>>
>> So to summarize the options proposed to date and issues with the proposals:
>> 1. reference timestamp in header
>> - does not work for appends across reboots
>>
>> 2. synthesized events
>> - preference against them
>>
>> 3. kernel side event
>> - cannot generate an initial sample (with counter value and
>> perf_clock timestamp) on demand - e.g., start of session; a proposal to
>> use an ioctl to add one to the event stream was shot down
>>
>> At this point the only idea that comes to mind is to use a combination
>> of 2 and 3: add the kernel side clock event
>> (https://lkml.org/lkml/2011/2/18/11), read the realtime clock counter,
>> read the monotonic clock timestamp (ie., perf_clock value), and
>> synthesize a perf sample that is written to the file. The append case
>> (with mismatch in --tod options between record invocations) would be
>> handled by having the kernel side clock event in the event list
>> (perf_evlist__equal would fail if --tod was not used for all invocations).
>
> Actually you first have to face a deeper problem. events are not stored
> in order in the flow, but they are sorted from perf_session__process_events().
>
> The bunch of sorted events is flushed periodically and sent to the consumer.
>
> See flush_sample_queue().
>
> And this sorting is made on top of the sample->time timestamps. So events
> are first sorted on sample->time and only afterward you have access to your
> gtod tracepoint samples. But if that gtod sample has been taken after a reboot
> then its sample->time is not consistant with the rest. It is not well sorted
> and thus the reftime won't be updated at the right moment.
>
> So the problem is that reftime update already depends on a consistant cpu
> timestamp.
>
> I can't think about a sane way to work around that. Sorting on gtod + cpu timestamp
> is not a solution because gtod can change.
>
> I'd rather propose to refuse append mode as long as we have any timestamp. That includes
> gtod but also sample timestamps. They are buggy if we reboot.
Arnaldo's sending patches, so I take it he's dug out from backlog. ;-)
Any objections to not allowing append mode for perf-record if samples
contain timestamps?
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists