lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJ+HfNgBT9TCEiHxj78ZZgByGZrfhv4d_1UwAAwK_VRAX6AY7Q@mail.gmail.com>
Date:   Tue, 29 Jan 2019 14:17:05 +0100
From:   Björn Töpel <bjorn.topel@...il.com>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     intel-wired-lan <intel-wired-lan@...ts.osuosl.org>,
        Björn Töpel <bjorn.topel@...el.com>,
        Paul Menzel <pmenzel@...gen.mpg.de>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        "Karlsson, Magnus" <magnus.karlsson@...el.com>,
        Magnus Karlsson <magnus.karlsson@...il.com>,
        Netdev <netdev@...r.kernel.org>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>,
        David Miller <davem@...emloft.net>
Subject: Re: [PATCH v2] i40e: replace switch-statement to speed-up
 retpoline-enabled builds

Den tis 29 jan. 2019 kl 12:17 skrev Daniel Borkmann <daniel@...earbox.net>:
>
> On 01/29/2019 10:57 AM, bjorn.topel@...il.com wrote:
> > From: Björn Töpel <bjorn.topel@...el.com>
> >
> > GCC will generate jump tables for switch-statements with more than 5
> > case statements. An entry into the jump table is an indirect call,
> > which means that for CONFIG_RETPOLINE builds, this is rather
> > expensive.
> >
> > This commit replaces the switch-statement that acts on the XDP program
> > result with an if-clause.
> >
> > The if-clause was also refactored into a common function that can be
> > used by AF_XDP zero-copy and non-zero-copy code.
> >
> > Performance prior this patch:
> > $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
> > Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
> > XDP stats       CPU     pps         issue-pps
> > XDP-RX CPU      20      18983018    0
> > XDP-RX CPU      total   18983018
> >
> > RXQ stats       RXQ:CPU pps         issue-pps
> > rx_queue_index   20:20  18983012    0
> > rx_queue_index   20:sum 18983012
> >
> > $ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
> >  sock0@...134s0f0:20 rxdrop
> >                 pps         pkts        2.00
> > rx              14,641,496  144,751,092
> > tx              0           0
> >
> > And after:
> > $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
> > Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
> > XDP stats       CPU     pps         issue-pps
> > XDP-RX CPU      20      24000986    0
> > XDP-RX CPU      total   24000986
> >
> > RXQ stats       RXQ:CPU pps         issue-pps
> > rx_queue_index   20:20  24000985    0
> > rx_queue_index   20:sum 24000985
> >
> >   +26%
> >
> > $ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
> >  sock0@...134s0f0:20 rxdrop
> >                 pps         pkts        2.00
> > rx              17,623,578  163,503,263
> > tx              0           0
> >
> >   +20%
> >
> > Signed-off-by: Björn Töpel <bjorn.topel@...el.com>
>
> Looks good. Given the performance improvements, wondering in general whether
> it would make sense to raise the default limit for generating jump tables if
> we have CONFIG_RETPOLINE enabled; as in:
>
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 9c5a67d..33495a9 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -217,6 +217,8 @@ KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
>  # Avoid indirect branches in kernel to deal with Spectre
>  ifdef CONFIG_RETPOLINE
>    KBUILD_CFLAGS += $(RETPOLINE_CFLAGS)
> +  # Avoid generating slow indirect jumps for small number of switch cases
> +  KBUILD_CFLAGS += --param case-values-threshold=12

Yes, it might make sense to raise it. All XDP capable drivers use a
switch to act on the action.

The default GCC for x86-64 is 5; I'm curious why you're suggesting 12,
I'd pick 17. ;-P


Björn

>  endif
>
>  archscripts: scripts_basic
>
> That would likely bloat the kernel a bit also in slow-path places where it
> would not be needed, but it would generically catch majority of cases. I'll
> run some experiments later today (but in any case that should not block this
> patch here).
>
> Cheers,
> Daniel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ