linux-kernel - Re: [RFC PATCH tip 0/5] tracing filters with BPF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Wed, 11 Dec 2013 18:48:42 -0800
From:	Alexei Starovoitov <ast@...mgrid.com>
To:	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>
Cc:	Ingo Molnar <mingo@...nel.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Andi Kleen <andi@...stfloor.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	"H. Peter Anvin" <hpa@...or.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Tom Zanussi <tom.zanussi@...ux.intel.com>,
	Jovi Zhangwei <jovi.zhangwei@...il.com>,
	Eric Dumazet <edumazet@...gle.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH tip 0/5] tracing filters with BPF

On Tue, Dec 10, 2013 at 7:35 PM, Masami Hiramatsu
<masami.hiramatsu.pt@...achi.com> wrote:
> (2013/12/11 11:32), Alexei Starovoitov wrote:
>> On Tue, Dec 10, 2013 at 7:47 AM, Ingo Molnar <mingo@...nel.org> wrote:
>>>
>>> * Alexei Starovoitov <ast@...mgrid.com> wrote:
>>>
>>>>> I'm fine if it becomes a requirement to have a vmlinux built with
>>>>> DEBUG_INFO to use BPF and have a tool like perf to translate the
>>>>> filters. But it that must not replace what the current filters do
>>>>> now. That is, it can be an add on, but not a replacement.
>>>>
>>>> Of course. tracing filters via bpf is an additional tool for kernel
>>>> debugging. bpf by itself has use cases beyond tracing.
>>>
>>> Well, Steve has a point: forcing DEBUG_INFO is a big showstopper for
>>> most people.
>>
>> there is a misunderstanding here.
>> I was saying 'of course' to 'not replace current filter infra'.
>>
>> bpf does not depend on debug info.
>> That's the key difference between 'perf probe' approach and bpf filters.
>>
>> Masami is right that what I was trying to achieve with bpf filters
>> is similar to 'perf probe': insert a dynamic probe anywhere
>> in the kernel, walk pointers, data structures, print interesting stuff.
>>
>> 'perf probe' does it via scanning vmlinux with debug info.
>> bpf filters don't need it.
>> tools/bpf/trace/*_orig.c examples only depend on linux headers
>> in /lib/modules/../build/include/
>> Today bpf compiler struct layout is the same as x86_64.
>>
>> Tomorrow bpf compiler will have flags to adjust endianness, pointer size, etc
>> of the front-end. Similar to -m32/-m64 and -m*-endian flags.
>> Neat part is that I don't need to do any work, just enable it properly in
>> the bpf backend. From gcc/llvm point of view, bpf is yet another 'hw'
>> architecture that compiler is emitting code for.
>> So when C code of filter_ex1_orig.c does 'skb->dev', compiler determines
>> field offset by looking at /lib/modules/.../include/skbuff.h
>> whereas for 'perf probe' 'skb->dev' means walk debug info.
>
> Right, the offset of the data structure can get from the header etc.
>
> However, how would the bpf get the register or stack assignment of
> skb itself? In the tracepoint macro, it will be able to get it from
> function parameters (it needs a trick, like jprobe does).
> I doubt you can do that on kprobes/uprobes without any debuginfo
> support. :(

the 4/5 diff actually shows how it's working ;)
for kprobes it works at the function entry, since arguments are still
in the registers
and walks the pointers further down.
It cannot do func+line_number as perf-probe does, of course.
for tracepoints it's the same trick: call no-inline func with traceprobe args
and call inlined crash_setup_regs() that stores the regs.

Of course, there are limitations. Like 7th func argument goes into
stack and requires
more work to get out. If struct is not defined in .h, it would need to
be redefined in filter.c
Corner cases as you said.
Today user of bpf filter needs to know that arg1 goes into %rdi and so on.
that is easy to cleanup.

>> Another use case is to optimize fetch sequences of dynamic probes
>> as Masami suggested, but backward compatibility requirement
>> would preserve to ways of doing it as well.
>
> The backward compatibility issue is only for the interface, but not
> for the implementation, I think. :) The fetch method and filter
> pred do already parse the argument into a syntax tree. IMHO, bpf
> can optimize that tree to just a simple opcode stream.

ahh. yes. that's doable.

Thanks
Alexei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/