netdev - Re: [PATCH bpf-next v3 0/6] Introduce the BPF dispatcher

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20191209205010.153a9060@carbon>
Date:   Mon, 9 Dec 2019 20:50:10 +0100
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Björn Töpel <bjorn.topel@...il.com>
Cc:     Netdev <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        bpf <bpf@...r.kernel.org>,
        Magnus Karlsson <magnus.karlsson@...il.com>,
        "Karlsson, Magnus" <magnus.karlsson@...el.com>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        Edward Cree <ecree@...arflare.com>,
        Toke Høiland-Jørgensen 
        <thoiland@...hat.com>, Andrii Nakryiko <andrii.nakryiko@...il.com>,
        brouer@...hat.com
Subject: Re: [PATCH bpf-next v3 0/6] Introduce the BPF dispatcher

On Mon, 9 Dec 2019 18:45:12 +0100
Björn Töpel <bjorn.topel@...il.com> wrote:

> On Mon, 9 Dec 2019 at 18:00, Jesper Dangaard Brouer <brouer@...hat.com> wrote:
> >
> > On Mon,  9 Dec 2019 14:55:16 +0100
> > Björn Töpel <bjorn.topel@...il.com> wrote:
> >  
> > > Performance
> > > ===========
> > >
> > > The tests were performed using the xdp_rxq_info sample program with
> > > the following command-line:
> > >
> > > 1. XDP_DRV:
> > >   # xdp_rxq_info --dev eth0 --action XDP_DROP
> > > 2. XDP_SKB:
> > >   # xdp_rxq_info --dev eth0 -S --action XDP_DROP
> > > 3. xdp-perf, from selftests/bpf:
> > >   # test_progs -v -t xdp_perf
> > >
> > >
> > > Run with mitigations=auto
> > > -------------------------
> > >
> > > Baseline:
> > > 1. 22.0 Mpps
> > > 2. 3.8 Mpps
> > > 3. 15 ns
> > >
> > > Dispatcher:
> > > 1. 29.4 Mpps (+34%)
> > > 2. 4.0 Mpps  (+5%)
> > > 3. 5 ns      (+66%)  
> >
> > Thanks for providing these extra measurement points.  This is good
> > work.  I just want to remind people that when working at these high
> > speeds, it is easy to get amazed by a +34% improvement, but we have to
> > be careful to understand that this is saving approx 10 ns time or
> > cycles.
> >
> > In reality cycles or time saved in #2 (3.8 Mpps -> 4.0 Mpps) is larger
> > (1/3.8-1/4)*1000 = 13.15 ns.  Than #1 (22.0 Mpps -> 29.4 Mpps)
> > (1/22-1/29.4)*1000 = 11.44 ns. Test #3 keeps us honest 15 ns -> 5 ns =
> > 10 ns.  The 10 ns improvement is a big deal in XDP context, and also
> > correspond to my own experience with retpoline (approx 12 ns overhead).
> >  
> 
> Ok, good! :-)
> 
> > To Bjørn, I would appreciate more digits on your Mpps numbers, so I get
> > more accuracy on my checks-and-balances I described above.  I suspect
> > the 3.8 Mpps -> 4.0 Mpps will be closer to the other numbers when we
> > get more accuracy.
> >  
> 
> Ok! Let me re-run them. 

Well, I don't think you should waste your time re-running these...

It clearly shows a significant improvement.  I'm just complaining that
I didn't have enough digits to do accurate checks-and-balances, they
are close enough that I believe them.


> If you have some spare cycles, yt would be
> great if you could try it out as well on your Mellanox setup.

I'll add it to my TODO list... but no promises.


> Historically you've always been able to get more stable numbers than
> I. :-)
> 
> >  
> > > Dispatcher (full; walk all entries, and fallback):
> > > 1. 20.4 Mpps (-7%)
> > > 2. 3.8 Mpps
> > > 3. 18 ns     (-20%)
> > >
> > > Run with mitigations=off
> > > ------------------------
> > >
> > > Baseline:
> > > 1. 29.6 Mpps
> > > 2. 4.1 Mpps
> > > 3. 5 ns
> > >
> > > Dispatcher:
> > > 1. 30.7 Mpps (+4%)
> > > 2. 4.1 Mpps
> > > 3. 5 ns  
> >
> > While +4% sounds good, but could be measurement noise ;-)
> >
> >  (1/29.6-1/30.7)*1000 = 1.21 ns
> >
> > As both #3 says 5 ns.
> >  
> 
> True. Maybe that simply hints that we shouldn't use the dispatcher here?

No. I actually think it is worth exposing this code as much as
possible. And if it really is 1.2 ns improvement, then I'll gladly take
that as well ;-)


I think this is awesome work! -- thanks for doing this!!!
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer