[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <a2a8af1622dff2bfd51d446aa8da2c1d2f6f543c.1611304190.git.lukas@wunner.de>
Date: Fri, 22 Jan 2021 09:47:01 +0100
From: Lukas Wunner <lukas@...ner.de>
To: "Pablo Neira Ayuso" <pablo@...filter.org>,
Jozsef Kadlecsik <kadlec@...filter.org>,
Florian Westphal <fw@...len.de>
Cc: netfilter-devel@...r.kernel.org, coreteam@...filter.org,
netdev@...r.kernel.org, Daniel Borkmann <daniel@...earbox.net>,
Alexei Starovoitov <ast@...nel.org>,
Eric Dumazet <edumazet@...gle.com>,
Thomas Graf <tgraf@...g.ch>,
Laura Garcia Liebana <nevola@...il.com>,
John Fastabend <john.fastabend@...il.com>
Subject: [PATCH nf-next v4 1/5] net: sched: Micro-optimize egress handling
sch_handle_egress() returns either the skb or NULL to signal to its
caller __dev_queue_xmit() whether a packet should continue to be
processed.
The skb is always non-NULL, otherwise __dev_queue_xmit() would hit a
NULL pointer deref right at its top.
But the compiler doesn't know that. So if sch_handle_egress() signals
success by returning the skb, the "if (!skb) goto out;" statement
results in a gratuitous NULL pointer check in the Assembler output.
Avoid by telling the compiler that __dev_queue_xmit() is never passed a
NULL skb. This also eliminates another gratuitous NULL pointer check in
__dev_queue_xmit()
qdisc_pkt_len_init()
skb_header_pointer()
__skb_header_pointer()
The speedup is barely measurable:
Before: 1877 1875 1878 1874 1882 1873 Mb/sec
After: 1877 1877 1880 1883 1888 1886 Mb/sec
However we're about to add a netfilter egress hook to __dev_queue_xmit()
and without the micro-optimization, it will result in a performance
degradation which is indeed measurable:
With netfilter hook: 1853 1852 1850 1848 1849 1851 Mb/sec
With netfilter hook + micro-optim: 1874 1877 1881 1875 1876 1876 Mb/sec
The performance degradation is caused by a JNE instruction ("if (skb)")
being flipped to a JE instruction ("if (!skb)") once the netfilter hook
is added. The micro-optimization removes the test and jump instructions
altogether.
Measurements were performed on a Core i7-3615QM. Reproducer:
ip link add dev foo type dummy
ip link set dev foo up
tc qdisc add dev foo clsact
tc filter add dev foo egress bpf da bytecode '1,6 0 0 0,'
modprobe pktgen
echo "add_device foo" > /proc/net/pktgen/kpktgend_3
samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i foo -n 400000000 -m "11:11:11:11:11:11" -d 1.1.1.1
Signed-off-by: Lukas Wunner <lukas@...ner.de>
Cc: John Fastabend <john.fastabend@...il.com>
Cc: Daniel Borkmann <daniel@...earbox.net>
Cc: Alexei Starovoitov <ast@...nel.org>
Cc: Eric Dumazet <edumazet@...gle.com>
Cc: Thomas Graf <tgraf@...g.ch>
---
net/core/dev.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/core/dev.c b/net/core/dev.c
index 7afbb642e203..4c16b9932823 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4072,6 +4072,7 @@ struct netdev_queue *netdev_core_pick_tx(struct net_device *dev,
* the BH enable code must have IRQs enabled so that it will not deadlock.
* --BLG
*/
+__attribute__((nonnull(1)))
static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
{
struct net_device *dev = skb->dev;
--
2.29.2
Powered by blists - more mailing lists