[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5650fdb7-5b86-9c7d-c112-1ff5ee7812b7@iogearbox.net>
Date: Tue, 29 Jan 2019 12:17:14 +0100
From: Daniel Borkmann <daniel@...earbox.net>
To: bjorn.topel@...il.com, intel-wired-lan@...ts.osuosl.org
Cc: Björn Töpel <bjorn.topel@...el.com>,
pmenzel@...gen.mpg.de, brouer@...hat.com,
magnus.karlsson@...el.com, magnus.karlsson@...il.com,
netdev@...r.kernel.org, alexei.starovoitov@...il.com,
davem@...emloft.net
Subject: Re: [PATCH v2] i40e: replace switch-statement to speed-up
retpoline-enabled builds
On 01/29/2019 10:57 AM, bjorn.topel@...il.com wrote:
> From: Björn Töpel <bjorn.topel@...el.com>
>
> GCC will generate jump tables for switch-statements with more than 5
> case statements. An entry into the jump table is an indirect call,
> which means that for CONFIG_RETPOLINE builds, this is rather
> expensive.
>
> This commit replaces the switch-statement that acts on the XDP program
> result with an if-clause.
>
> The if-clause was also refactored into a common function that can be
> used by AF_XDP zero-copy and non-zero-copy code.
>
> Performance prior this patch:
> $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
> Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
> XDP stats CPU pps issue-pps
> XDP-RX CPU 20 18983018 0
> XDP-RX CPU total 18983018
>
> RXQ stats RXQ:CPU pps issue-pps
> rx_queue_index 20:20 18983012 0
> rx_queue_index 20:sum 18983012
>
> $ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
> sock0@...134s0f0:20 rxdrop
> pps pkts 2.00
> rx 14,641,496 144,751,092
> tx 0 0
>
> And after:
> $ sudo ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP
> Running XDP on dev:enp134s0f0 (ifindex:7) action:XDP_DROP options:no_touch
> XDP stats CPU pps issue-pps
> XDP-RX CPU 20 24000986 0
> XDP-RX CPU total 24000986
>
> RXQ stats RXQ:CPU pps issue-pps
> rx_queue_index 20:20 24000985 0
> rx_queue_index 20:sum 24000985
>
> +26%
>
> $ sudo ./xdpsock -i enp134s0f0 -q 20 -n 2 -z -r
> sock0@...134s0f0:20 rxdrop
> pps pkts 2.00
> rx 17,623,578 163,503,263
> tx 0 0
>
> +20%
>
> Signed-off-by: Björn Töpel <bjorn.topel@...el.com>
Looks good. Given the performance improvements, wondering in general whether
it would make sense to raise the default limit for generating jump tables if
we have CONFIG_RETPOLINE enabled; as in:
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 9c5a67d..33495a9 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -217,6 +217,8 @@ KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
# Avoid indirect branches in kernel to deal with Spectre
ifdef CONFIG_RETPOLINE
KBUILD_CFLAGS += $(RETPOLINE_CFLAGS)
+ # Avoid generating slow indirect jumps for small number of switch cases
+ KBUILD_CFLAGS += --param case-values-threshold=12
endif
archscripts: scripts_basic
That would likely bloat the kernel a bit also in slow-path places where it
would not be needed, but it would generically catch majority of cases. I'll
run some experiments later today (but in any case that should not block this
patch here).
Cheers,
Daniel
Powered by blists - more mailing lists