netdev - Re: [PATCH bpf-next 1/6] bpf: implement BPF ring buffer and verifier support for it

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKH8qBsy_DLhwie+g6o7yHEv_tBT5K2YCdjsn1j4KkhVvRSr5A@mail.gmail.com>
Date:   Thu, 14 May 2020 14:56:01 -0700
From:   Stanislav Fomichev <sdf@...gle.com>
To:     Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc:     Andrii Nakryiko <andriin@...com>, bpf <bpf@...r.kernel.org>,
        Networking <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Kernel Team <kernel-team@...com>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Jonathan Lemon <jonathan.lemon@...il.com>
Subject: Re: [PATCH bpf-next 1/6] bpf: implement BPF ring buffer and verifier
 support for it

On Thu, May 14, 2020 at 2:13 PM Andrii Nakryiko
<andrii.nakryiko@...il.com> wrote:
>
> On Thu, May 14, 2020 at 1:53 PM <sdf@...gle.com> wrote:
> >
> > On 05/14, Andrii Nakryiko wrote:
> > > On Thu, May 14, 2020 at 10:33 AM <sdf@...gle.com> wrote:
> > > >
> > > > On 05/13, Andrii Nakryiko wrote:
> >
> > [...]
> >
> > > > > + * void bpf_ringbuf_submit(void *data)
> > > > > + *   Description
> > > > > + *           Submit reserved ring buffer sample, pointed to by
> > > *data*.
> > > > > + *   Return
> > > > > + *           Nothing.
> > > > Even though you mention self-pacing properties, would it still
> > > > make sense to add some argument to bpf_ringbuf_submit/bpf_ringbuf_output
> > > > to indicate whether to wake up userspace or not? Maybe something like
> > > > a threshold of number of outstanding events in the ringbuf after which
> > > > we do the wakeup? The default 0/1 preserve the existing behavior.
> > > >
> > > > The example I can give is a control plane userspace thread that
> > > > once a second aggregates the events, it doesn't care about millisecond
> > > > resolution. With the current scheme, I suppose, if BPF generates events
> > > > every 1ms, the userspace will be woken up 1000 times (if it can keep
> > > > up). Most of the time, we don't really care and some buffering
> > > > properties are desired.
> >
> > > perf buffer has setting like this, and believe me, it's so confusing
> > > and dangerous, that I wouldn't want this to be exposed. Even though I
> > > was aware of this behavior, I still had to debug and work-around this
> > > lack on wakeup few times, it's really-really confusing feature.
> >
> > > In your case, though, why wouldn't user-space poll data just once a
> > > second, if it's not interested in getting data as fast as possible?
> > If I poll once per second I might lose the events if, for some reason,
> > there is a spike. I really want to have something like: "wakeup
> > userspace if the ringbuffer fill is over some threshold or
> > the last wakeup was too long ago". We currently do this via a percpu
> > cache map. IIRC, you've shared on lsfmmbpf that you do something like
> > that as well.
>
> Hm... don't remember such use case on our side. All applications I
> know of use default perf_buffer settings with no sampling.
Nevermind, I might have misunderstood :-)

> > So I was thinking how I can use new ringbuff to remove the unneeded
> > copies and help with the reordering, but I'm a bit concerned about
> > regressing on the number of wakeups.
> >
> > Maybe having a flag like RINGBUF_NO_WAKEUP in bpf_ringbuf_submit()
> > will suffice? And if there is a helper or some way to obtain a
> > number of unconsumed items, I can implement my own flushing policy.
>
> Ok, I guess giving application control at each discard/commit makes
> for ultimate flexibility. Let me add flags argument to commit/discard
> and allow to specify NO_WAKEUP flag. As for count of unconsumed events
> -- that would be a bit expensive to maintain. How about amount of data
> that's not consumed? It's obviously going to be racy, but returning
> (producer_pos - consumer_pos) should be sufficient enough for such
> smart and best-effort heuristics? WDYT?
Awesome, SGTM! Racy is fine (I don't see how we can make it non-racy
as well). The amount of data instead of the number of items is also fine
since I know the size of the buffer.