netdev - Re: [PATCH bpf-next v3 0/6] Introduce the BPF dispatcher

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20191209180008.72c98c53@carbon>
Date:   Mon, 9 Dec 2019 18:00:08 +0100
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Björn Töpel <bjorn.topel@...il.com>
Cc:     brouer@...hat.com, netdev@...r.kernel.org, ast@...nel.org,
        daniel@...earbox.net, bpf@...r.kernel.org,
        magnus.karlsson@...il.com, magnus.karlsson@...el.com,
        jonathan.lemon@...il.com, ecree@...arflare.com,
        thoiland@...hat.com, andrii.nakryiko@...il.com
Subject: Re: [PATCH bpf-next v3 0/6] Introduce the BPF dispatcher

On Mon,  9 Dec 2019 14:55:16 +0100
Björn Töpel <bjorn.topel@...il.com> wrote:

> Performance
> ===========
> 
> The tests were performed using the xdp_rxq_info sample program with
> the following command-line:
> 
> 1. XDP_DRV:
>   # xdp_rxq_info --dev eth0 --action XDP_DROP
> 2. XDP_SKB:
>   # xdp_rxq_info --dev eth0 -S --action XDP_DROP
> 3. xdp-perf, from selftests/bpf:
>   # test_progs -v -t xdp_perf
> 
> 
> Run with mitigations=auto
> -------------------------
> 
> Baseline:
> 1. 22.0 Mpps
> 2. 3.8 Mpps
> 3. 15 ns
> 
> Dispatcher:
> 1. 29.4 Mpps (+34%)
> 2. 4.0 Mpps  (+5%)
> 3. 5 ns      (+66%)

Thanks for providing these extra measurement points.  This is good
work.  I just want to remind people that when working at these high
speeds, it is easy to get amazed by a +34% improvement, but we have to
be careful to understand that this is saving approx 10 ns time or
cycles.

In reality cycles or time saved in #2 (3.8 Mpps -> 4.0 Mpps) is larger
(1/3.8-1/4)*1000 = 13.15 ns.  Than #1 (22.0 Mpps -> 29.4 Mpps)
(1/22-1/29.4)*1000 = 11.44 ns. Test #3 keeps us honest 15 ns -> 5 ns =
10 ns.  The 10 ns improvement is a big deal in XDP context, and also
correspond to my own experience with retpoline (approx 12 ns overhead).

To Bjørn, I would appreciate more digits on your Mpps numbers, so I get
more accuracy on my checks-and-balances I described above.  I suspect
the 3.8 Mpps -> 4.0 Mpps will be closer to the other numbers when we
get more accuracy.

 
> Dispatcher (full; walk all entries, and fallback):
> 1. 20.4 Mpps (-7%)
> 2. 3.8 Mpps  
> 3. 18 ns     (-20%)
> 
> Run with mitigations=off
> ------------------------
> 
> Baseline:
> 1. 29.6 Mpps
> 2. 4.1 Mpps
> 3. 5 ns
> 
> Dispatcher:
> 1. 30.7 Mpps (+4%)
> 2. 4.1 Mpps
> 3. 5 ns

While +4% sounds good, but could be measurement noise ;-)

 (1/29.6-1/30.7)*1000 = 1.21 ns

As both #3 says 5 ns.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer