netdev - Re: [PATCH v2 bpf-next 6/7] bpf: add BPF ringbuf and perf buffer benchmarks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEf4Bza34RXPn_1m-nnEceRCQLFykEv9YQXpZZnyk0sE3X-Jwg@mail.gmail.com>
Date:   Fri, 22 May 2020 12:07:46 -0700
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Andrii Nakryiko <andriin@...com>, bpf <bpf@...r.kernel.org>,
        Networking <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Kernel Team <kernel-team@...com>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Jonathan Lemon <jonathan.lemon@...il.com>
Subject: Re: [PATCH v2 bpf-next 6/7] bpf: add BPF ringbuf and perf buffer benchmarks

On Thu, May 21, 2020 at 6:21 PM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Sun, May 17, 2020 at 12:57:26PM -0700, Andrii Nakryiko wrote:
> > +
> > +static inline int roundup_len(__u32 len)
> > +{
> > +     /* clear out top 2 bits */
> > +     len <<= 2;
> > +     len >>= 2;
> > +     /* add length prefix */
> > +     len += RINGBUF_META_LEN;
> > +     /* round up to 8 byte alignment */
> > +     return (len + 7) / 8 * 8;
> > +}
>
> the same round_up again?
>
> > +
> > +static void ringbuf_custom_process_ring(struct ringbuf_custom *r)
> > +{
> > +     unsigned long cons_pos, prod_pos;
> > +     int *len_ptr, len;
> > +     bool got_new_data;
> > +
> > +     cons_pos = smp_load_acquire(r->consumer_pos);
> > +     while (true) {
> > +             got_new_data = false;
> > +             prod_pos = smp_load_acquire(r->producer_pos);
> > +             while (cons_pos < prod_pos) {
> > +                     len_ptr = r->data + (cons_pos & r->mask);
> > +                     len = smp_load_acquire(len_ptr);
> > +
> > +                     /* sample not committed yet, bail out for now */
> > +                     if (len & RINGBUF_BUSY_BIT)
> > +                             return;
> > +
> > +                     got_new_data = true;
> > +                     cons_pos += roundup_len(len);
> > +
> > +                     atomic_inc(&buf_hits.value);
> > +             }
> > +             if (got_new_data)
> > +                     smp_store_release(r->consumer_pos, cons_pos);
> > +             else
> > +                     break;
> > +     };
> > +}
>
> copy paste from libbpf? why?

Yep, it's deliberate, see description of rb-custom benchmark in commit message.

Basically, I was worried that generic libbpf callback calling might
introduce a noticeable slowdown and wanted to test the case where
ultimate peformance was necessary and someone would just implement
custom reading loop with inlined processing logic. And it turned out
to be noticeable, see benchmark results for rb-libbpf and rb-custom
benchmarks. So I satisfied my curiosity :), but if you think that's
not necessary, I can drop rb-custom (and, similarly, pb-custom for
perfbuf). It still seems useful to me, though (and is sort of an
example of minimal correct implementation of ringbuf/perfbuf reading).

Btw, apart from speed up from avoiding indirect calls (due to
callback), this algorithm also does "batched"
smp_store_release(r->consumer_pos) only once after each inner
`while(cons_pos < prod_pos)` loop, so there is theoretically less
cache line bouncing, which might give a bit more speed as well. Libbpf
pessimistically does smp_store_release() after each record in
assumption that customer-provided callback might take a while, so it's
better to give producers a bit more space back as early as possible.