lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 13 May 2020 22:59:05 -0700
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Alan Maguire <alan.maguire@...cle.com>
Cc:     Andrii Nakryiko <andriin@...com>, bpf <bpf@...r.kernel.org>,
        Networking <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Kernel Team <kernel-team@...com>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Jonathan Lemon <jonathan.lemon@...il.com>
Subject: Re: [PATCH bpf-next 1/6] bpf: implement BPF ring buffer and verifier
 support for it

On Wed, May 13, 2020 at 2:59 PM Alan Maguire <alan.maguire@...cle.com> wrote:
>
> On Wed, 13 May 2020, Andrii Nakryiko wrote:
>
> > This commits adds a new MPSC ring buffer implementation into BPF ecosystem,
> > which allows multiple CPUs to submit data to a single shared ring buffer. On
> > the consumption side, only single consumer is assumed.
> >
> > Motivation
> > ----------
> > There are two distinctive motivators for this work, which are not satisfied by
> > existing perf buffer, which prompted creation of a new ring buffer
> > implementation.
> >   - more efficient memory utilization by sharing ring buffer across CPUs;
> >   - preserving ordering of events that happen sequentially in time, even
> >   across multiple CPUs (e.g., fork/exec/exit events for a task).
> >
> > These two problems are independent, but perf buffer fails to satisfy both.
> > Both are a result of a choice to have per-CPU perf ring buffer.  Both can be
> > also solved by having an MPSC implementation of ring buffer. The ordering
> > problem could technically be solved for perf buffer with some in-kernel
> > counting, but given the first one requires an MPSC buffer, the same solution
> > would solve the second problem automatically.
> >
>
> This looks great Andrii! One potentially interesting side-effect of
> the way this is implemented is that it could (I think) support speculative
> tracing.
>
> Say I want to record some tracing info when I enter function foo(), but
> I only care about cases where that function later returns an error value.
> I _think_ your implementation could support that via a scheme like
> this:
>
> - attach a kprobe program to record the data via bpf_ringbuf_reserve(),
>   and store the reserved pointer value in a per-task keyed hashmap.
>   Then record the values of interest in the reserved space. This is our
>   speculative data as we don't know whether we want to commit it yet.
>
> - attach a kretprobe program that picks up our reserved pointer and
>   commit()s or discard()s the associated data based on the return value.
>
> - the consumer should (I think) then only read the committed data, so in
>   this case just the data of interest associated with the failure case.
>
> I'm curious if that sort of ringbuf access pattern across multiple
> programs would work? Thanks!


Right now it's not allowed. Similar to spin lock and socket reference,
verifier will enforce that reserved record is committed or discarded
within the same BPF program invocation. Technically, nothing prevents
us from relaxing this and allowing to store this pointer in a map, but
that's probably way too dangerous and not necessary for most common
cases.

But all your troubles with this is due to using a pair of
kprobe+kretprobe. What I think should solve your problem is a single
fexit program. It can read input arguments *and* return value of
traced function. So there won't be any need for additional map and
storing speculative data (and no speculation as well, because you'll
just know beforehand if you even need to capture data). Does this work
for your case?

>
> Alan
>

[...]

no one seems to like trimming emails ;)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ