linux-kernel - Re: [PATCH v5 3/4] bpf: Add libbpf logic for user-space ring buffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YyjYVUetY0OL/9sX@maniforge.dhcp.thefacebook.com>
Date:   Mon, 19 Sep 2022 16:00:05 -0500
From:   David Vernet <void@...ifault.com>
To:     Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc:     ast@...nel.org, daniel@...earbox.net, andrii@...nel.org,
        martin.lau@...ux.dev, bpf@...r.kernel.org, song@...nel.org,
        yhs@...com, john.fastabend@...il.com, kpsingh@...nel.org,
        sdf@...gle.com, haoluo@...gle.com, jolsa@...nel.org, tj@...nel.org,
        linux-kernel@...r.kernel.org, kernel-team@...com
Subject: Re: [PATCH v5 3/4] bpf: Add libbpf logic for user-space ring buffer

On Mon, Sep 19, 2022 at 03:22:09PM -0500, David Vernet wrote:

[...]

> > > +       timeout_ns = timeout_ms * ns_per_ms;
> > > +       do {
> > > +               __u64 ns_remaining = timeout_ns - ns_elapsed;
> > > +               int cnt, ms_remaining;
> > > +               void *sample;
> > > +               struct timespec curr;
> > > +
> > > +               sample = user_ring_buffer__reserve(rb, size);
> > > +               if (sample)
> > > +                       return sample;
> > > +               else if (errno != ENOSPC)
> > > +                       return NULL;
> > > +
> > > +               ms_remaining = timeout_ms == -1 ? -1 : ns_remaining / ns_per_ms;
> > 
> > ok, so you've special-cased timeout_ms == -1 but still didn't do
> > max(0, ns_remaining). Can you prove that ns_elapsed will never be
> > bigger than timeout_ns due to various delays in waking up this thread?
> > If not, let's please have max(0) otherwise we can accidentally
> > epoll_wait(-1).
> 
> Yes you're right, this was an oversight. Thanks for catching this!

Wait, actually, this can't happen because of the while check at the end of
the loop:

while (ns_elapsed <= timeout_ns)

So I don't think the max is necessary to prevent underflowing, but I do
think we need to have one more attempt to invoke
user_ring_buffer__reserve() at the end of the function to account for
wakeup delays after epoll_wait() returns. Otherwise, we might think we've
timed out when there's actually a sample left in the buffer. I also still
think your suggestion below for cleaning up makes sense, so I'll still add
it in v6, but I think I can leave off the max() call.

> 
> > > +               /* The kernel guarantees at least one event notification
> > > +                * delivery whenever at least one sample is drained from the
> > > +                * ring buffer in an invocation to bpf_ringbuf_drain(). Other
> > > +                * additional events may be delivered at any time, but only one
> > > +                * event is guaranteed per bpf_ringbuf_drain() invocation,
> > > +                * provided that a sample is drained, and the BPF program did
> > > +                * not pass BPF_RB_NO_WAKEUP to bpf_ringbuf_drain().
> > > +                */
> > > +               cnt = epoll_wait(rb->epoll_fd, &rb->event, 1, ms_remaining);
> > > +               if (cnt < 0)
> > > +                       return NULL;
> > > +
> > > +               if (timeout_ms == -1)
> > > +                       continue;
> > > +
> > > +               err = clock_gettime(CLOCK_MONOTONIC, &curr);
> > > +               if (err)
> > > +                       return NULL;
> > > +
> > > +               ns_elapsed = ns_elapsed_timespec(&start, &curr);
> > 
> > nit: if you move re-calculation of ms_remaining and ns_remaining to
> > here, I think overall loop logic will be even more straightforwad. You
> > can initialize ms_remaining to -1 if timeout_ms < 0 and never
> > recalculate it, right? Note that you can also do ns_elapsed conversion
> > to ms right here and then keep everything else in ms (so no need for
> > timeout_ns, ns_remaining, etc).
> 
> Sounds good, let me give this a shot in v6.
> 
> Thanks for another detailed review!