netdev - Re: [PATCH bpf-next v4 18/27] bpf, x64: Store properly return value for trampoline with multi func programs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210902215538.a75q7bjcgkpjync4@ast-mbp.dhcp.thefacebook.com>
Date:   Thu, 2 Sep 2021 14:55:38 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Jiri Olsa <jolsa@...hat.com>
Cc:     Andrii Nakryiko <andrii.nakryiko@...il.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andriin@...com>,
        "Steven Rostedt (VMware)" <rostedt@...dmis.org>,
        Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...omium.org>, Daniel Xu <dxu@...uu.xyz>,
        Viktor Malik <vmalik@...hat.com>
Subject: Re: [PATCH bpf-next v4 18/27] bpf, x64: Store properly return value
 for trampoline with multi func programs

On Thu, Sep 02, 2021 at 02:57:11PM +0200, Jiri Olsa wrote:
> > 
> > Let's say we have 5 kernel functions: a, b, c, d, e. Say a, b, c all
> > have 1 input args, and d and e have 2.
> > 
> > Now let's say we attach just normal fentry program A to function a.
> > Also we attach normal fexit program E to func e.
> > 
> > We'll have A  attached to a with trampoline T1. We'll also have E
> > attached to e with trampoline T2. Right?
> > 
> > And now we try to attach generic fentry (fentry.multi in your
> > terminology) prog X to all 5 of them. If A and E weren't attached,
> > we'd need two generic trampolines, one for a, b, c (because 1 input
> > argument) and another for d,e (because 2 input arguments). But because
> > we already have A and B attached, we'll end up needing 4:
> > 
> > T1 (1 arg)  for func a calling progs A and X
> > T2 (2 args) for func e calling progs E and X
> > T3 (1 arg)  for func b and c calling X
> > T4 (2 args) for func d calling X
> 
> so current code would group T3/T4 together, but if we keep
> them separated, then we won't need to use new model and
> cut off some of the code, ok

We've brainstormed this idea further with Andrii.
(thankfully we could do it in-person now ;) which saved a ton of time)

It seems the following should work:
5 kernel functions: a(int), b(long), c(void*), d(int, int), e(long, long).
fentry prog A is attached to 'a'.
fexit prog E is attached to 'e'.
multi-prog X wants to attach to all of them.
It can be achieved with 4 trampolines.

The trampolines called from funcs 'a' and 'e' can be patched to
call A+X and E+X programs correspondingly.
The multi program X needs to be able to access return values
and arguments of all functions it was attached to.
We can achieve that by always generating a trampoline (both multi and normal)
with extra constant stored in the stack. This constant is the number of
arguments served by this trampoline.
The trampoline 'a' will store nr_args=1.
The tramopline 'e' will store nr_args=2.
We need two multi trampolines.
The multi tramopline X1 that will serve 'b' and 'c' and store nr_args=1
and multi-tramopline X2 that will serve 'd' and store nr_args=2
into hidden stack location (like ctx[-2]).

The multi prog X can look like:
int BPF_PROG(x, __u64 arg1, __u64 arg2, __u64 ret)
in such case it will read correct args and ret when called from 'd' and 'e'
and only correct arg1 when called from 'a', 'b', 'c'.

To always correctly access arguments and the return value
the program can use two new helpers: bpf_arg(ctx, N) and bpf_ret_value(ctx).
Both will be fully inlined helpers similar to bpf_get_func_ip().
u64 bpf_arg(ctx, int n)
{
  u64 nr_args = ctx[-2]; /* that's the place where _all_ trampoline will store nr_args */
  if (n > nr_args)
    return 0;
  return ctx[n];
}
u64 bpf_ret_value(ctx)
{
  u64 nr_args = ctx[-2];
  return ctx[nr_args];
}

These helpers will be the only recommended way to access args and ret value
in multi progs.
The nice advantage is that normal fentry/fexit progs can use them too.

We can rearrange ctx[-1] /* func_ip */ and ctx[-2] /* nr_args */
if it makes things easier.

If multi prog knows that it is attaching to 100 kernel functions
and all of them have 2 arguments it can still do
int BPF_PROG(x, __u64 arg1, __u64 arg2, __u64 ret)
{ // access arg1, arg2, ret directly
and it will work correctly.

We can make it really strict in the verifier and disallow such
direct access to args from the multi prog and only allow
access via bpf_arg/bpf_ret_value helpers, but I think it's overkill.
Reading garbage values from stack isn't great, but it's not a safety issue.
It means that the verifier will allow something like 16 u64-s args
in multi program. It cannot allow large number, since ctx[1024]
might become a safety issue, while ctx[4] could be a garbage
or a valid value depending on the call site.

Thoughts?