netdev - Re: [PATCH bpf-next 1/4] bpf: enable program stats

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20190227040524.dsmrqpcbmf2pvebu@ast-mbp.dhcp.thefacebook.com>
Date:   Tue, 26 Feb 2019 20:05:26 -0800
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     Alexei Starovoitov <ast@...com>, Roman Gushchin <guro@...com>,
        Alexei Starovoitov <ast@...nel.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
        Kernel Team <Kernel-team@...com>
Subject: Re: [PATCH bpf-next 1/4] bpf: enable program stats

On Tue, Feb 26, 2019 at 04:29:45PM +0100, Daniel Borkmann wrote:
> On 02/26/2019 05:27 AM, Alexei Starovoitov wrote:
> > On 2/25/19 2:36 AM, Daniel Borkmann wrote:
> >>
> >> Not through the stack, but was more thinking something like low-overhead
> >> kprobes-style extension for a BPF prog where such sequence would be added
> >> 'inline' at beginning / exit of BPF prog invocation with normal ctx access
> >> and helpers as the native program (e.g. time-stamping plus read/write into
> >> mark as one example which kprobes couldn't do). But might go much beyond
> >> context of this stats collection.
> > 
> > see below.
> > 
> >> That's actually an interesting thought, given the original prog->bpf_func
> >> is a known address at that time, this could be templated where an inner
> >> dummy bpf_func call to the normal BPF prog gets later replaced with the
> >> actual prog->bpf_func address to have no indirect call. This would also
> >> allow to keep the actual prog->bpf_func entry call-sites small and simple.
> > 
> > My first thought was that we indeed can have a template of stats
> > collecting wrapper in .text,
> > then at 'switch stats on' event vmalloc exec region, copy wrapper code
> > into it and manually patch 'call foo' x86 insn with direct jump.
> > That is still quite involved from coding side.
> > It's similar to what kprobe/ftrace is doing, but probably
> > an overkill for stats due to potential security
> > concerns with new executable code.
> > But it won't work due to tail_calls.
> 
> You mean for the scenario to measure the _individual_ progs that
> are part of the given tail call processing such that each have
> their individual inc in count and time-spent attribution. If so,
> yeah, agree. Or for the case you have right now where everything
> gets attributed to the "entry" program that triggers the tail
> call processing? Latter would be similar as we have now in this
> patch, imho.

There are two other issues with 'epilogue' approach.
1.
We call progs as (*(prog)->bpf_func)(ctx, (prog)->insnsi)
and from R2 we can get back to prog and from there go to prog->aux-stats,
but there is no room in the stack to save this R2.
So we'd need extra 16-bytes for start_ns and saved R2
that epilogue can use to update stats in the 'entry' prog.
With static_key approach gcc was smart enough to use callee-saved
registers and no extra stack was used in the caller
(at least for one callsite of BPF_PROG_RUN where I've looked at asm)

2.
when stats are flipped on and off we need to make sure that all bpf progs
in prog_array are with stats on or off. Cannot mix and match.
Otherwise this 16-byte stack mismatch will cause issues.
Which is impossible to do atomically for the whole prog_array.

> > With static_key approach the main prog accounts the time
> > of all progs called via tail_call.
> > We can argue whether it's ideal semantics, but looks like
> > it's the only thing we can do.
> 
> Yeah true, it's more about measuring how long the prog 'chain'
> in the whole hook is taking to complete, which may be interesting
> in itself, just perhaps a bit confusing when looking at individual
> progs' share.

yep. would be good to come up with a way to measure individual progs,
but I couldn't figure it out yet.

> It's simplest, yep. I guess if we get to the point of removing
> indirect call for BPF_PROG_RUN itself, we might need to revisit
> then and move it into bpf_func at that point. Kind of implementation
> detail given we only expose count and time spent data to uapi,
> though the direct switch at runtime may be a bit problematic in
> future. Anyway, lets see how it goes when we get there.

yep