[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <B5145998-CD5C-4B1B-9A42-CD19691EF80B@163.com>
Date: Mon, 13 Jul 2015 22:29:14 +0800
From: pi3orama <pi3orama@....com>
To: Namhyung Kim <namhyung@...nel.org>
Cc: He Kuang <hekuang@...wei.com>,
Alexei Starovoitov <ast@...mgrid.com>,
"rostedt@...dmis.org" <rostedt@...dmis.org>,
"masami.hiramatsu.pt@...achi.com" <masami.hiramatsu.pt@...achi.com>,
"acme@...nel.org" <acme@...nel.org>,
"a.p.zijlstra@...llo.nl" <a.p.zijlstra@...llo.nl>,
"mingo@...hat.com" <mingo@...hat.com>,
"jolsa@...nel.org" <jolsa@...nel.org>,
"wangnan0@...wei.com" <wangnan0@...wei.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH v4 3/3] bpf: Introduce function for outputing data to perf event
发自我的 iPhone
> 在 2015年7月13日,下午10:09,Namhyung Kim <namhyung@...nel.org> 写道:
>
>> On Mon, Jul 13, 2015 at 10:01:26PM +0800, pi3orama wrote:
>>
>>
>> 发自我的 iPhone
>>
>>> 在 2015年7月13日,下午9:52,Namhyung Kim <namhyung@...nel.org> 写道:
>>>
>>> Hi,
>>>
>>>> On Mon, Jul 13, 2015 at 12:36:27PM +0800, He Kuang wrote:
>>>> hi, Alexei
>>>>
>>>>>> On 2015/7/11 6:10, Alexei Starovoitov wrote:
>>>>>> On 7/10/15 3:03 AM, He Kuang wrote:
>>>>>> There're scenarios that we need an eBPF program to record not only
>>>>>> kprobe point args, but also the PMU counters, time latencies or the
>>>>>> number of cache misses between two probe points and other information
>>>>>> when the probe point is entered.
>>>>>>
>>>>>> This patch adds a new trace event to establish infrastruction for bpf to
>>>>>> output data to perf. Userspace perf tools can detect and use this event
>>>>>> as using the existing tracepoint events.
>>>>>>
>>>>>> New bpf trace event entry in debugfs:
>>>>>>
>>>>>> /sys/kernel/debug/tracing/events/bpf/bpf_output_data
>>>>>>
>>>>>> Userspace perf tools detect the new tracepoint event as:
>>>>>>
>>>>>> bpf:bpf_output_data [Tracepoint event]
>>>>>
>>>>> Nice! This approach looks cleanest so far.
>>>>>
>>>>>> +TRACE_EVENT(bpf_output_data,
>>>>>> +
>>>>>> + TP_PROTO(u64 *src, int len),
>>>>>> +
>>>>>> + TP_ARGS(src, len),
>>>>>> +
>>>>>> + TP_STRUCT__entry(
>>>>>> + __dynamic_array(u64, buf, len)
>>>>>> + ),
>>>>>> +
>>>>>> + TP_fast_assign(
>>>>>> + memcpy(__get_dynamic_array(buf), src, len * sizeof(u64));
>>>>>
>>>>> may be make it 'u8' array? The extra multiply and...
>>>>
>>>> OK
>>>>
>>>> So the output of three u64 integers (e.g. 0x2060572485, 0x20667b0ff2,
>>>> 0x623eb6d) will be this:
>>>>
>>>> dd 994 [000] 139.158180: bpf:bpf_output_data: 85 24 57 60 20 00 00 00
>>>> f2 0f 7b 66 20 00 00 00 6d eb 23 06 00 00 00 00
>>>>
>>>> And users are not restricted to u64 type elements. I'll change that.
>>>
>>> While this general event format works well, I think it might be hard
>>> to know which output came from which program when more than one bpf
>>> programs used.
>>>
>>> I was thinking about providing custom event formats for each bpf
>>> program (if needed). The event format definitions might be in a
>>> specific directory or a bpf object itself. Then perf can read those
>>> formats and print the output data according to the formats. Maybe we
>>> need to add some dynamic event id to match format and data.
>>
>> I think we can do it in perf side. Let BPF programs themselves
>> encode format information into the array and make perf read and
>> decode them. In kernel side simply support raw data should be
>> enough, so we can make kernel code as simple as possible.
>
> Yes, of course, I also meant that doing those work all in perf side. :)
>
I have an idea on it:
struct x{
int a;
int b;
};
struct x __x;
SEC(...)
int func(void *ctx)
{
struct x myx;
...
myx.a = 1;
myx.b = 2;
OUTPUT(&myx, &__x);
...
}
In the above program, the '&' operator will emit a relocation, so libbpf will have a chance to know the exact type of the output data. It then can translate into a unique number. The OUTPUT macro should pass the number through the raw data. When decoding, by reading the first word in the raw data perf knows the format. According to it perf can then retrieve the structure format through dwarf information. We can use more macro to make the above code simpler.
We will start working on it after this patch get accepted.
Thank you.
> Thanks,
> Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists