[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51CB1BA5.9010307@gmail.com>
Date: Wed, 26 Jun 2013 10:49:41 -0600
From: David Ahern <dsahern@...il.com>
To: Pawel Moll <pawel.moll@....com>,
Peter Zijlstra <peterz@...radead.org>,
Stephane Eranian <eranian@...gle.com>,
Ingo Molnar <mingo@...nel.org>
CC: LKML <linux-kernel@...r.kernel.org>,
Paul Mackerras <paulus@...ba.org>,
Anton Blanchard <anton@...ba.org>,
Will Deacon <Will.Deacon@....com>,
"ak@...ux.intel.com" <ak@...ux.intel.com>,
Pekka Enberg <penberg@...il.com>,
Steven Rostedt <rostedt@...dmis.org>,
Robert Richter <robert.richter@....com>,
tglx <tglx@...utronix.de>, John Stultz <john.stultz@...aro.org>
Subject: Re: [RFC] perf: need to expose sched_clock to correlate user samples
with kernel samples
With all the perf ioctl extensions tossed out the past day or so I
wanted to revive this request. Still need a solution to the problem of
correlating perf_clock to other clocks ...
On 2/1/13 7:18 AM, Pawel Moll wrote:
> Hello,
>
> I'd like to revive the topic...
>
> On Tue, 2012-10-16 at 18:23 +0100, Peter Zijlstra wrote:
>> On Tue, 2012-10-16 at 12:13 +0200, Stephane Eranian wrote:
>>> Hi,
>>>
>>> There are many situations where we want to correlate events happening at
>>> the user level with samples recorded in the perf_event kernel sampling buffer.
>>> For instance, we might want to correlate the call to a function or creation of
>>> a file with samples. Similarly, when we want to monitor a JVM with jitted code,
>>> we need to be able to correlate jitted code mappings with perf event samples
>>> for symbolization.
>>>
>>> Perf_events allows timestamping of samples with PERF_SAMPLE_TIME.
>>> That causes each PERF_RECORD_SAMPLE to include a timestamp
>>> generated by calling the local_clock() -> sched_clock_cpu() function.
>>>
>>> To make correlating user vs. kernel samples easy, we would need to
>>> access that sched_clock() functionality. However, none of the existing
>>> clock calls permit this at this point. They all return timestamps which are
>>> not using the same source and/or offset as sched_clock.
>>>
>>> I believe a similar issue exists with the ftrace subsystem.
>>>
>>> The problem needs to be adressed in a portable manner. Solutions
>>> based on reading TSC for the user level to reconstruct sched_clock()
>>> don't seem appropriate to me.
>>>
>>> One possibility to address this limitation would be to extend clock_gettime()
>>> with a new clock time, e.g., CLOCK_PERF.
>>>
>>> However, I understand that sched_clock_cpu() provides ordering guarantees only
>>> when invoked on the same CPU repeatedly, i.e., it's not globally synchronized.
>>> But we already have to deal with this problem when merging samples obtained
>>> from different CPU sampling buffer in per-thread mode. So this is not
>>> necessarily
>>> a showstopper.
>>>
>>> Alternatives could be to use uprobes but that's less practical to setup.
>>>
>>> Anyone with better ideas?
>>
>> You forgot to CC the time people ;-)
>>
>> I've no problem with adding CLOCK_PERF (or another/better name).
>>
>> Thomas, John?
>
> I've just faced the same issue - correlating an event in userspace with
> data from the perf stream, but to my mind what I want to get is a value
> returned by perf_clock() _in the current "session" context_.
>
> Stephane didn't like the idea of opening a "fake" perf descriptor in
> order to get the timestamp, but surely one must have the "session"
> already running to be interested in such data in the first place? So I
> think the ioctl() idea is not out of place here... How about the simple
> change below?
>
> Regards
>
> Pawel
>
> 8<---
> From 2ad51a27fbf64bf98cee190efc3fbd7002819692 Mon Sep 17 00:00:00 2001
> From: Pawel Moll <pawel.moll@....com>
> Date: Fri, 1 Feb 2013 14:03:56 +0000
> Subject: [PATCH] perf: Add ioctl to return current time value
>
> To co-relate user space events with the perf events stream
> a current (as in: "what time(stamp) is it now?") time value
> must be made available.
>
> This patch adds a perf ioctl that makes this possible.
>
> Signed-off-by: Pawel Moll <pawel.moll@....com>
> ---
> include/uapi/linux/perf_event.h | 1 +
> kernel/events/core.c | 8 ++++++++
> 2 files changed, 9 insertions(+)
>
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 4f63c05..b745fb0 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -316,6 +316,7 @@ struct perf_event_attr {
> #define PERF_EVENT_IOC_PERIOD _IOW('$', 4, __u64)
> #define PERF_EVENT_IOC_SET_OUTPUT _IO ('$', 5)
> #define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *)
> +#define PERF_EVENT_IOC_GET_TIME _IOR('$', 7, __u64)
>
> enum perf_event_ioc_flags {
> PERF_IOC_FLAG_GROUP = 1U << 0,
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 301079d..4202b1c 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3298,6 +3298,14 @@ static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> case PERF_EVENT_IOC_SET_FILTER:
> return perf_event_set_filter(event, (void __user *)arg);
>
> + case PERF_EVENT_IOC_GET_TIME:
> + {
> + u64 time = perf_clock();
> + if (copy_to_user((void __user *)arg, &time, sizeof(time)))
> + return -EFAULT;
> + return 0;
> + }
> +
> default:
> return -ENOTTY;
> }
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists