[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMEtUuy+B9aXP-8m7tA6aNxnS4SKRhMpfGEdWMcbxbj7ggOATw@mail.gmail.com>
Date: Tue, 10 Feb 2015 19:04:55 -0800
From: Alexei Starovoitov <ast@...mgrid.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Ingo Molnar <mingo@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
Arnaldo Carvalho de Melo <acme@...radead.org>,
Jiri Olsa <jolsa@...hat.com>,
Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
Linux API <linux-api@...r.kernel.org>,
Network Development <netdev@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
"Eric W. Biederman" <ebiederm@...ssion.com>
Subject: Re: [PATCH v3 linux-trace 1/8] tracing: attach eBPF programs to
tracepoints and syscalls
On Tue, Feb 10, 2015 at 4:50 PM, Steven Rostedt <rostedt@...dmis.org> wrote:
>
>> >> But some maintainers think of them as ABI, whereas others
>> >> are using them freely. imo it's time to remove ambiguity.
>> >
>> > I would love to, and have brought this up at Kernel Summit more than
>> > once with no solution out of it.
>>
>> let's try it again at plumbers in august?
>
> Well, we need a statement from Linus. And it would be nice if we could
> also get Ingo involved in the discussion, but he seldom comes to
> anything but Kernel Summit.
+1
> BTW, I wonder if I could make a simple compiler in the kernel that
> would translate the current ftrace filters into a BPF program, where it
> could use the program and not use the current filter logic.
yep. I've sent that patch last year.
It converted pred_tree into bpf program.
I can try to dig it up. It doesn't provide extra programmability
though, just makes filtering logic much faster.
>> imo the solution is DEFINE_EVENT_BPF that doesn't
>> print anything and a bpf program to process it.
>
> You mean to be completely invisible to ftrace? And the debugfs/tracefs
> directory?
I mean it will be seen in tracefs to get 'id', but without enable/format/filter
>> I'm not suggesting to preserve the meaning of 'pid' semantically
>> in all cases. That's not what users would want anyway.
>> I want to allow programs to access important fields and print
>> them in more generic way than current TP_printk does.
>> Then exposed ABI of such tracepoint_bpf is smaller than
>> with current tracepoints.
>
> Again, this would mean they become invisible to ftrace, and even
> ftrace_dump_on_oops.
yes, since these new tracepoints have no meat inside them.
They're placeholders sitting idle and waiting for bpf to do
something useful with them.
> I'm not fully understanding what is to be exported by this new ABI. If
> the fields available, will always be available, then why can't the
> appear in a TP_printk()?
say, we define trace_netif_rx_entry() as this new tracepoint_bpf.
It will have only one argument 'skb'.
bpf program will read and print skb fields the way it likes
for particular tracing scenario.
So instead of making
TP_printk("dev=%s napi_id=%#x queue_mapping=%u skbaddr=%p
vlan_tagged=%d vlan_proto=0x%04x vlan_tci=0x%04x protocol=0x%04x
ip_summed=%d hash=0x%08x l4_hash=%d len=%u data_len=%u truesize=%u
mac_header_valid=%d mac_header=%d nr_frags=%d gso_size=%d
gso_type=%#x",...
the abi exposed via trace_pipe (as it is today),
the new tracepoint_bpf abi is presence of 'skb' pointer as one
and only argument to bpf program.
Future refactoring of netif_rx would need to guarantee
that trace_netif_rx_entry(skb) is called. that's it.
imo such tracepoints are much easier to deal with during
code changes.
May be some of the existing tracepoints like this one that
takes one argument can be marked 'bpf-ready', so that
programs can attach to them only.
>> let's start slow then with bpf+syscall and bpf+kprobe only.
>
> I'm fine with that.
thanks. will wait for merge window to close and will repost.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists