netdev - Re: [PATCH v3 linux-trace 1/8] tracing: attach eBPF programs to tracepoints and syscalls

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20150210233159.748f44ad@grimm.local.home>
Date:	Tue, 10 Feb 2015 23:31:59 -0500
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Alexei Starovoitov <ast@...mgrid.com>
Cc:	Ingo Molnar <mingo@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
	Arnaldo Carvalho de Melo <acme@...radead.org>,
	Jiri Olsa <jolsa@...hat.com>,
	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
	Linux API <linux-api@...r.kernel.org>,
	Network Development <netdev@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Peter Zijlstra <peterz@...radead.org>,
	"Eric W. Biederman" <ebiederm@...ssion.com>
Subject: Re: [PATCH v3 linux-trace 1/8] tracing: attach eBPF programs to
 tracepoints and syscalls

On Tue, 10 Feb 2015 19:04:55 -0800
Alexei Starovoitov <ast@...mgrid.com> wrote:

> > You mean to be completely invisible to ftrace? And the debugfs/tracefs
> > directory?
> 
> I mean it will be seen in tracefs to get 'id', but without enable/format/filter

In other words, invisible to ftrace. I'm not sure I'll be on board to
work with that.

> > Again, this would mean they become invisible to ftrace, and even
> > ftrace_dump_on_oops.
> 
> yes, since these new tracepoints have no meat inside them.
> They're placeholders sitting idle and waiting for bpf to do
> something useful with them.

Hmm, I have a patch somewhere (never posted it), that add
TRACE_MARKER(), which basically would just print that it was hit. But
no data was passed to it. Something like that I would be more inclined
to take. Then the question is, what can bpf access there? Could just be
a place holder to add a "fast kprobe". This way, it can show up in
trace events with enable and and all that, it just wont have any data
to print out besides the normal pid, flags, etc.

But because it will inject a nop, that could be converted to a jump, it
will give you the power of kprobes but with the speed of a tracepoint.

> 
> > I'm not fully understanding what is to be exported by this new ABI. If
> > the fields available, will always be available, then why can't the
> > appear in a TP_printk()?
> 
> say, we define trace_netif_rx_entry() as this new tracepoint_bpf.
> It will have only one argument 'skb'.
> bpf program will read and print skb fields the way it likes
> for particular tracing scenario.
> So instead of making
> TP_printk("dev=%s napi_id=%#x queue_mapping=%u skbaddr=%p
> vlan_tagged=%d vlan_proto=0x%04x vlan_tci=0x%04x protocol=0x%04x
> ip_summed=%d hash=0x%08x l4_hash=%d len=%u data_len=%u truesize=%u
> mac_header_valid=%d mac_header=%d nr_frags=%d gso_size=%d
> gso_type=%#x",...
> the abi exposed via trace_pipe (as it is today),
> the new tracepoint_bpf abi is presence of 'skb' pointer as one
> and only argument to bpf program.
> Future refactoring of netif_rx would need to guarantee
> that trace_netif_rx_entry(skb) is called. that's it.
> imo such tracepoints are much easier to deal with during
> code changes.

But what can you access from that skb that is guaranteed to be there?
If you say anything, then there's no reason it can't be added to the
printk as well.

> 
> May be some of the existing tracepoints like this one that
> takes one argument can be marked 'bpf-ready', so that
> programs can attach to them only.

I really hate the idea of adding tracepoints that ftrace can't use. It
basically kills the entire busy box usage scenario, as boards that have
extremely limited userspace still make full use of ftrace via the
existing tracepoints.

I still don't see the argument that adding data via the bpf functions
is any different than adding those same items to fields in an event.
Once you add a bpf function, then you must maintain those fields.

Look, you can always add more to a TP_printk(), as that is standard
with all text file kernel parsing. Like stat in /proc. Fields can not
be removed, but more can always be appended to the end.

Any tool that parses trace_pipe is broken if it can't handle extended
fields. The api can be extended, and for text files, that is by
appending to them.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html