netdev - Re: [PATCH bpf-next 1/4] bpf: enable program stats

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ae7a96a3-d70e-1885-db59-f9d0a55fb636@iogearbox.net>
Date:   Tue, 26 Feb 2019 16:29:45 +0100
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Alexei Starovoitov <ast@...com>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Roman Gushchin <guro@...com>, Alexei Starovoitov <ast@...nel.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
        Kernel Team <Kernel-team@...com>
Subject: Re: [PATCH bpf-next 1/4] bpf: enable program stats

On 02/26/2019 05:27 AM, Alexei Starovoitov wrote:
> On 2/25/19 2:36 AM, Daniel Borkmann wrote:
>>
>> Not through the stack, but was more thinking something like low-overhead
>> kprobes-style extension for a BPF prog where such sequence would be added
>> 'inline' at beginning / exit of BPF prog invocation with normal ctx access
>> and helpers as the native program (e.g. time-stamping plus read/write into
>> mark as one example which kprobes couldn't do). But might go much beyond
>> context of this stats collection.
> 
> see below.
> 
>> That's actually an interesting thought, given the original prog->bpf_func
>> is a known address at that time, this could be templated where an inner
>> dummy bpf_func call to the normal BPF prog gets later replaced with the
>> actual prog->bpf_func address to have no indirect call. This would also
>> allow to keep the actual prog->bpf_func entry call-sites small and simple.
> 
> My first thought was that we indeed can have a template of stats
> collecting wrapper in .text,
> then at 'switch stats on' event vmalloc exec region, copy wrapper code
> into it and manually patch 'call foo' x86 insn with direct jump.
> That is still quite involved from coding side.
> It's similar to what kprobe/ftrace is doing, but probably
> an overkill for stats due to potential security
> concerns with new executable code.
> But it won't work due to tail_calls.

You mean for the scenario to measure the _individual_ progs that
are part of the given tail call processing such that each have
their individual inc in count and time-spent attribution. If so,
yeah, agree. Or for the case you have right now where everything
gets attributed to the "entry" program that triggers the tail
call processing? Latter would be similar as we have now in this
patch, imho.

> I've looked into single wrapper approach with indirect call
> (because it's much easer than patching x86) and realized it
> won't work for the same reason.
> Inline kprobe+kretprobe insertion won't work either.
> All because we have tail_calls.
> we cannot have executable epilogue in progs.
> tail_call will jmp into next prog and second part of stats collection
> won't run. Essentially no epilogue allowed unless we disable tail_call.
> Or we somehow reimplement tail_calls so that prog returns,
> an epilogue is executed and then jumps into the next prog.
> tail_call turned out be the worst corner case feature to deal with.

Agree, it's a pain as in most cases. Was more thinking that
'special' epilogue would inc the count, start first ktime/cycle
snapshot and then do a direct function call to the normally
JITed BPF prog (where it can do tail calls, etc, so nothing
changes there), and upon return from this call take the second
measurement, and update the per CPU buffer, similar as we have
now just with static keys approach. So, was thinking similar as
we JIT BPF-to-BPF calls on an individual prog basis.

> With static_key approach the main prog accounts the time
> of all progs called via tail_call.
> We can argue whether it's ideal semantics, but looks like
> it's the only thing we can do.

Yeah true, it's more about measuring how long the prog 'chain'
in the whole hook is taking to complete, which may be interesting
in itself, just perhaps a bit confusing when looking at individual
progs' share.

> I don't see a way how tail_called progs can count their own time.

Me neither, if so I guess it would be of similar fashion as you
have right now where it's accounted to the first prog.

> Stats would somehow need to be computed right before jumping
> into next prog. Which means heavy changes in all JITs
> plus extra performance cost at runtime, since such if (stats_on)
> check in tail_call handling would have to be done at runtime,
> since JITed code is read-only and regenerating progs due to stats
> is non-starter.
> Or tail-called next prog would need to update stats for previous
> prog, but there is no pointer to 'aux->stats' there.
> So imo that is dead end.
> static_key approach is the simplest by far.

It's simplest, yep. I guess if we get to the point of removing
indirect call for BPF_PROG_RUN itself, we might need to revisit
then and move it into bpf_func at that point. Kind of implementation
detail given we only expose count and time spent data to uapi,
though the direct switch at runtime may be a bit problematic in
future. Anyway, lets see how it goes when we get there.

Thanks,
Daniel