netdev - Re: [RFCv3 00/19] x86/ftrace/bpf: Add batch support for direct/tracing attach

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4af931a5-3c43-9571-22ac-63e5d299fa42@fb.com>
Date:   Sat, 19 Jun 2021 09:19:57 -0700
From:   Yonghong Song <yhs@...com>
To:     Jiri Olsa <jolsa@...hat.com>,
        Andrii Nakryiko <andrii.nakryiko@...il.com>
CC:     Jiri Olsa <jolsa@...nel.org>, Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andriin@...com>,
        "Steven Rostedt (VMware)" <rostedt@...dmis.org>,
        Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...omium.org>, Daniel Xu <dxu@...uu.xyz>,
        Viktor Malik <vmalik@...hat.com>
Subject: Re: [RFCv3 00/19] x86/ftrace/bpf: Add batch support for
 direct/tracing attach



On 6/19/21 1:33 AM, Jiri Olsa wrote:
> On Thu, Jun 17, 2021 at 01:29:45PM -0700, Andrii Nakryiko wrote:
>> On Sat, Jun 5, 2021 at 4:12 AM Jiri Olsa <jolsa@...nel.org> wrote:
>>>
>>> hi,
>>> saga continues.. ;-) previous post is in here [1]
>>>
>>> After another discussion with Steven, he mentioned that if we fix
>>> the ftrace graph problem with direct functions, he'd be open to
>>> add batch interface for direct ftrace functions.
>>>
>>> He already had prove of concept fix for that, which I took and broke
>>> up into several changes. I added the ftrace direct batch interface
>>> and bpf new interface on top of that.
>>>
>>> It's not so many patches after all, so I thought having them all
>>> together will help the review, because they are all connected.
>>> However I can break this up into separate patchsets if necessary.
>>>
>>> This patchset contains:
>>>
>>>    1) patches (1-4) that fix the ftrace graph tracing over the function
>>>       with direct trampolines attached
>>>    2) patches (5-8) that add batch interface for ftrace direct function
>>>       register/unregister/modify
>>>    3) patches (9-19) that add support to attach BPF program to multiple
>>>       functions
>>>
>>> In nutshell:
>>>
>>> Ad 1) moves the graph tracing setup before the direct trampoline
>>> prepares the stack, so they don't clash
>>>
>>> Ad 2) uses ftrace_ops interface to register direct function with
>>> all functions in ftrace_ops filter.
>>>
>>> Ad 3) creates special program and trampoline type to allow attachment
>>> of multiple functions to single program.
>>>
>>> There're more detailed desriptions in related changelogs.
>>>
>>> I have working bpftrace multi attachment code on top this. I briefly
>>> checked retsnoop and I think it could use the new API as well.
>>
>> Ok, so I had a bit of time and enthusiasm to try that with retsnoop.
>> The ugly code is at [0] if you'd like to see what kind of changes I
>> needed to make to use this (it won't work if you check it out because
>> it needs your libbpf changes synced into submodule, which I only did
>> locally). But here are some learnings from that experiment both to
>> emphasize how important it is to make this work and how restrictive
>> are some of the current limitations.
>>
>> First, good news. Using this mass-attach API to attach to almost 1000
>> kernel functions goes from
>>
>> Plain fentry/fexit:
>> ===================
>> real    0m27.321s
>> user    0m0.352s
>> sys     0m20.919s
>>
>> to
>>
>> Mass-attach fentry/fexit:
>> =========================
>> real    0m2.728s
>> user    0m0.329s
>> sys     0m2.380s
> 
> I did not meassured the bpftrace speedup, because the new code
> attached instantly ;-)
> 
>>
>> It's a 10x speed up. And a good chunk of those 2.7 seconds is in some
>> preparatory steps not related to fentry/fexit stuff.
>>
>> It's not exactly apples-to-apples, though, because the limitations you
>> have right now prevents attaching both fentry and fexit programs to
>> the same set of kernel functions. This makes it pretty useless for a
> 
> hum, you could do link_update with fexit program on the link fd,
> like in the selftest, right?
> 
>> lot of cases, in particular for retsnoop. So I haven't really tested
>> retsnoop end-to-end, I only verified that I do see fentries triggered,
>> but can't have matching fexits. So the speed-up might be smaller due
>> to additional fexit mass-attach (once that is allowed), but it's still
>> a massive difference. So we absolutely need to get this optimization
>> in.
>>
>> Few more thoughts, if you'd like to plan some more work ahead ;)
>>
>> 1. We need similar mass-attach functionality for kprobe/kretprobe, as
>> there are use cases where kprobe are more useful than fentry (e.g., >6
>> args funcs, or funcs with input arguments that are not supported by
>> BPF verifier, like struct-by-value). It's not clear how to best
>> represent this, given currently we attach kprobe through perf_event,
>> but we'll need to think about this for sure.
> 
> I'm fighting with the '2 trampolines concept' at the moment, but the
> mass attach for kprobes seems interesting ;-) will check
> 
>>
>> 2. To make mass-attach fentry/fexit useful for practical purposes, it
>> would be really great to have an ability to fetch traced function's
>> IP. I.e., if we fentry/fexit func kern_func_abc, bpf_get_func_ip()
>> would return IP of that functions that matches the one in
>> /proc/kallsyms. Right now I do very brittle hacks to do that.
> 
> so I hoped that we could store ip always in ctx-8 and have
> the bpf_get_func_ip helper to access that, but the BPF_PROG
> macro does not pass ctx value to the program, just args

ctx does pass to the bpf program. You can check BPF_PROG
macro definition.

> 
> we could perhaps somehow store the ctx in BPF_PROG before calling
> the bpf program, but I did not get to try that yet
> 
>>
>> So all-in-all, super excited about this, but I hope all those issues
>> are addressed to make retsnoop possible and fast.
>>
>>    [0] https://github.com/anakryiko/retsnoop/commit/8a07bc4d8c47d025f755c108f92f0583e3fda6d8
> 
> thanks for checking on this,
> jirka
>