[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJ+HfNgBT9TCEiHxj78ZZgByGZrfhv4d_1UwAAwK_VRAX6AY7Q@mail.gmail.com>
Date: Tue, 29 Jan 2019 14:17:05 +0100
From: Björn Töpel <bjorn.topel@...il.com>
To: Daniel Borkmann <daniel@...earbox.net>
Cc: intel-wired-lan <intel-wired-lan@...ts.osuosl.org>,
Björn Töpel <bjorn.topel@...el.com>,
Paul Menzel <pmenzel@...gen.mpg.de>,
Jesper Dangaard Brouer <brouer@...hat.com>,
"Karlsson, Magnus" <magnus.karlsson@...el.com>,
Magnus Karlsson <magnus.karlsson@...il.com>,
Netdev <netdev@...r.kernel.org>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
David Miller <davem@...emloft.net>
Subject: Re: [PATCH v2] i40e: replace switch-statement to speed-up
retpoline-enabled builds
Den tis 29 jan. 2019 kl 12:17 skrev Daniel Borkmann <daniel@...earbox.net>:
>
> On 01/29/2019 10:57 AM, bjorn.topel@...il.com wrote:
> > From: Björn Töpel <bjorn.topel@...el.com>
> >
> > GCC will generate jump tables for switch-statements with more than 5
> > case statements. An entry into the jump table is an indirect call,
> > which means that for CONFIG_RETPOLINE builds, this is rather
> > expensive.
> >
> > This commit replaces the switch-statement that acts on the XDP program
> > result with an if-clause.
> >
> > The if-clause was also refactored into a common function that can be
> > used by AF_XDP zero-copy and non-zero-copy code.
> >
> > Performance prior this patch:
> > $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
> > Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
> > XDP stats CPU pps issue-pps
> > XDP-RX CPU 20 18983018 0
> > XDP-RX CPU total 18983018
> >
> > RXQ stats RXQ:CPU pps issue-pps
> > rx_queue_index 20:20 18983012 0
> > rx_queue_index 20:sum 18983012
> >
> > $ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
> > sock0@...134s0f0:20 rxdrop
> > pps pkts 2.00
> > rx 14,641,496 144,751,092
> > tx 0 0
> >
> > And after:
> > $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
> > Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
> > XDP stats CPU pps issue-pps
> > XDP-RX CPU 20 24000986 0
> > XDP-RX CPU total 24000986
> >
> > RXQ stats RXQ:CPU pps issue-pps
> > rx_queue_index 20:20 24000985 0
> > rx_queue_index 20:sum 24000985
> >
> > +26%
> >
> > $ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
> > sock0@...134s0f0:20 rxdrop
> > pps pkts 2.00
> > rx 17,623,578 163,503,263
> > tx 0 0
> >
> > +20%
> >
> > Signed-off-by: Björn Töpel <bjorn.topel@...el.com>
>
> Looks good. Given the performance improvements, wondering in general whether
> it would make sense to raise the default limit for generating jump tables if
> we have CONFIG_RETPOLINE enabled; as in:
>
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 9c5a67d..33495a9 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -217,6 +217,8 @@ KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> # Avoid indirect branches in kernel to deal with Spectre
> ifdef CONFIG_RETPOLINE
> KBUILD_CFLAGS += $(RETPOLINE_CFLAGS)
> + # Avoid generating slow indirect jumps for small number of switch cases
> + KBUILD_CFLAGS += --param case-values-threshold=12
Yes, it might make sense to raise it. All XDP capable drivers use a
switch to act on the action.
The default GCC for x86-64 is 5; I'm curious why you're suggesting 12,
I'd pick 17. ;-P
Björn
> endif
>
> archscripts: scripts_basic
>
> That would likely bloat the kernel a bit also in slow-path places where it
> would not be needed, but it would generically catch majority of cases. I'll
> run some experiments later today (but in any case that should not block this
> patch here).
>
> Cheers,
> Daniel
Powered by blists - more mailing lists