linux-kernel - Re: [PATCH v5 2/4] bpf: Add bpf_user_ringbuf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <Yyj47Bc01MvJyU9n@maniforge.dhcp.thefacebook.com>
Date:   Mon, 19 Sep 2022 18:19:08 -0500
From:   David Vernet <void@...ifault.com>
To:     Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc:     Joanne Koong <joannelkoong@...il.com>, ast@...nel.org,
        daniel@...earbox.net, andrii@...nel.org, martin.lau@...ux.dev,
        bpf@...r.kernel.org, song@...nel.org, yhs@...com,
        john.fastabend@...il.com, kpsingh@...nel.org, sdf@...gle.com,
        haoluo@...gle.com, jolsa@...nel.org, tj@...nel.org,
        linux-kernel@...r.kernel.org, kernel-team@...com
Subject: Re: [PATCH v5 2/4] bpf: Add bpf_user_ringbuf_drain() helper

On Mon, Sep 19, 2022 at 03:19:36PM -0500, David Vernet wrote:
> > > +static void __bpf_user_ringbuf_sample_release(struct bpf_ringbuf *rb, size_t size, u64 flags)
> > > +{
> > > +       u64 consumer_pos;
> > > +       u32 rounded_size = round_up(size + BPF_RINGBUF_HDR_SZ, 8);
> > > +
> > > +       /* Using smp_load_acquire() is unnecessary here, as the busy-bit
> > > +        * prevents another task from writing to consumer_pos after it was read
> > > +        * by this task with smp_load_acquire() in __bpf_user_ringbuf_peek().
> > > +        */
> > > +       consumer_pos = rb->consumer_pos;
> > > +        /* Synchronizes with smp_load_acquire() in user-space producer. */
> > > +       smp_store_release(&rb->consumer_pos, consumer_pos + rounded_size);
> > > +
> > > +       /* Prevent the clearing of the busy-bit from being reordered before the
> > > +        * storing of the updated rb->consumer_pos value.
> > > +        */
> > > +       smp_mb__before_atomic();
> > > +       atomic_set(&rb->busy, 0);
> > > +
> > > +       if (flags & BPF_RB_FORCE_WAKEUP)
> > > +               irq_work_queue(&rb->work);
> > 
> > I think this part is new, you decided to define that FORCE_WAKEUP
> > sends wakeup after every single consumed sample? I have no strong
> > opinion on this, tbh, just wonder if it wasn't enough to do it once
> > after drain?
> 
> I didn't have a strong reason for doing this other than that I think it
> more closely matches the behavior for BPF_MAP_TYPE_RINGBUF (which invokes
> irq_work_queue() after every call to bpf_ringbuf_commit() if
> BPF_RB_FORCE_WAKEUP is passed). Let's just match that behavior unless we
> have a good reason not to? I think that will be more intuitive for users.

Hmm, something else to consider is that if we move the busy-bit setting
into bpf_user_ringbuf_drain() per your suggestion below, the critical
section is now the the whole sample drain loop. That's of course _not_ the
case for BPF_MAP_TYPE_RINGBUF, which just holds the spinlock while
reserving the sample. It seems excessive to invoke irq_work_queue() while
the busy bit is held, so I think we should just have the behavior be to
only have BPF_RB_FORCE_WAKEUP imply that a wakeup will always be sent, even
if no sample was drained.

Let me know if you disagree, but for now I'll work on spinning up a v6 that
only issues the forced wakeup event once after drain.

> > > +}
> > > +
> > > +BPF_CALL_4(bpf_user_ringbuf_drain, struct bpf_map *, map,
> > > +          void *, callback_fn, void *, callback_ctx, u64, flags)
> > > +{
> > > +       struct bpf_ringbuf *rb;
> > > +       long samples, discarded_samples = 0, ret = 0;
> > > +       bpf_callback_t callback = (bpf_callback_t)callback_fn;
> > > +       u64 wakeup_flags = BPF_RB_NO_WAKEUP | BPF_RB_FORCE_WAKEUP;
> > > +
> > > +       if (unlikely(flags & ~wakeup_flags))
> > > +               return -EINVAL;
> > > +
> > > +       rb = container_of(map, struct bpf_ringbuf_map, map)->rb;
> > > +       for (samples = 0; samples < BPF_MAX_USER_RINGBUF_SAMPLES && ret == 0; samples++) {
> > > +               int err;
> > > +               u32 size;
> > > +               void *sample;
> > > +               struct bpf_dynptr_kern dynptr;
> > > +
> > > +               err = __bpf_user_ringbuf_peek(rb, &sample, &size);
> > 
> > so I also just realized that ringbuf_peek will keep setting/resetting
> > busy flag, and in like all the practical case it's a completely
> > useless work as we don't intend to have competing consumers, right? So
> > maybe move busy bit handling into drain itself and document that peek
> > expect busy taken care of?
> > 
> > This should be noticeable faster when there are multiple records
> > consumed in one drain.
> 
> Great idea, I'll do this in v6.

Thanks,
David