netdev - Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YHh6YeOPh0HIlb3e@krava>
Date:   Thu, 15 Apr 2021 19:39:45 +0200
From:   Jiri Olsa <jolsa@...hat.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Andrii Nakryiko <andrii.nakryiko@...il.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andriin@...com>,
        Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...omium.org>, Daniel Xu <dxu@...uu.xyz>,
        Jesper Brouer <jbrouer@...hat.com>,
        Toke Høiland-Jørgensen <toke@...hat.com>,
        Viktor Malik <vmalik@...hat.com>
Subject: Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe

On Thu, Apr 15, 2021 at 11:10:02AM -0400, Steven Rostedt wrote:

SNIP

> > > heya,
> > > I had some initial prototypes trying this way, but always ended up
> > > in complicated code, that's why I turned to ftrace_ops.
> > >
> > > let's see if it'll make any sense to you ;-)
> > >
> > > 1) so let's say we have extra trampoline for the program (which
> > > also seems a bit of waste since there will be just single record  
> > 
> > BPF trampoline does more than just calls BPF program. At the very
> > least it saves input arguments for fexit program to be able to access
> > it. But given it's one BPF trampoline attached to thousands of
> > functions, I don't see any problem there.
> 
> Note, there's a whole infrastructure that does similar things in ftrace.
> I wrote the direct call to jump to individual trampolines, because ftrace
> was too generic. The only way at the time to get to the arguments was via
> the ftrace_regs_caller, which did a full save of regs, because this was
> what kprobes needed, and was too expensive for BPF.
> 
> I now regret writing the direct callers, and instead should have just done
> what I did afterward, which was to make ftrace default to a light weight
> trampoline that only saves enough for getting access to the arguments of
> the function. And have BPF use that. But I was under the impression that
> BPF needed fast access to a single function, and it would not become a
> generic trampoline for multiple functions, because that was the argument
> used to not enhance ftrace.
> 
> Today, ftrace by dafault (on x86) implements a generic way to get the
> arguments, and just the arguments which is exactly what BPF would need for
> multiple functions. And yes, you even have access to the return code if you
> want to "hijack" it. And since it was originally for a individual functions
> (and not a batch), I created the direct caller for BPF. But the direct
> caller will not be enhanced for multiple functions, as that's not its
> purpose. If you want a trampoline to be called back to multiple functions,
> then use the infrastructure that was designed for that. Which is what Jiri
> had proposed here.
> 
> And because the direct caller can mess with the return code, it breaks
> function graph tracing. As a temporary work around, we just made function
> graph ignore any function that has a direct caller attached to it.
> 
> If you want batch processing of BPF programs, you need to first fix the
> function graph tracing issue, and allow both BPF attached callers and
> function graph to work on the same functions.
> 
> I don't know how the BPF code does it, but if you are tracing the exit
> of a function, I'm assuming that you hijack the return pointer and replace
> it with a call to a trampoline that has access to the arguments. To do

hi,
it's bit different, the trampoline makes use of the fact that the
call to trampoline is at the very begining of the function and, so
it can call the origin function with 'call function + 5' instr.

so in nutshell the trampoline does:

  call entry_progs
  call original_func+5
  call exit_progs

you can check this in arch/x86/net/bpf_jit_comp.c in moe detail:

 * The assembly code when eth_type_trans is called from trampoline:
 *
 * push rbp
 * mov rbp, rsp
 * sub rsp, 24                     // space for skb, dev, return value
 * push rbx                        // temp regs to pass start time
 * mov qword ptr [rbp - 24], rdi   // save skb pointer to stack
 * mov qword ptr [rbp - 16], rsi   // save dev pointer to stack
 * call __bpf_prog_enter           // rcu_read_lock and preempt_disable
 * mov rbx, rax                    // remember start time if bpf stats are enabled
 * lea rdi, [rbp - 24]             // R1==ctx of bpf prog
 * call addr_of_jited_FENTRY_prog  // bpf prog can access skb and dev

entry program called ^^^

 * movabsq rdi, 64bit_addr_of_struct_bpf_prog  // unused if bpf stats are off
 * mov rsi, rbx                    // prog start time
 * call __bpf_prog_exit            // rcu_read_unlock, preempt_enable and stats math
 * mov rdi, qword ptr [rbp - 24]   // restore skb pointer from stack
 * mov rsi, qword ptr [rbp - 16]   // restore dev pointer from stack
 * call eth_type_trans+5           // execute body of eth_type_trans

original function called ^^^

 * mov qword ptr [rbp - 8], rax    // save return value
 * call __bpf_prog_enter           // rcu_read_lock and preempt_disable
 * mov rbx, rax                    // remember start time in bpf stats are enabled
 * lea rdi, [rbp - 24]             // R1==ctx of bpf prog
 * call addr_of_jited_FEXIT_prog   // bpf prog can access skb, dev, return value

exit program called ^^^

 * movabsq rdi, 64bit_addr_of_struct_bpf_prog  // unused if bpf stats are off
 * mov rsi, rbx                    // prog start time
 * call __bpf_prog_exit            // rcu_read_unlock, preempt_enable and stats math
 * mov rax, qword ptr [rbp - 8]    // restore eth_type_trans's return value
 * pop rbx
 * leave
 * add rsp, 8                      // skip eth_type_trans's frame
 * ret                             // return to its caller

> this you need a shadow stack to save the real return as well as the
> parameters of the function. This is something that I have patches that do
> similar things with function graph.
> 
> If you want this feature, lets work together and make this work for both
> BPF and ftrace.

it's been some time I saw a graph tracer, is there a way to make it
access input arguments and make it available through ftrace_ops
interface?

thanks,
jirka