[<prev] [next>] [day] [month] [year] [list]
Message-ID: <4248ac4d-9ef1-4aa6-2e6c-9c5097bebf9b@gmail.com>
Date: Mon, 4 Nov 2019 10:05:04 +0100
From: Rafał Miłecki <zajec5@...il.com>
To: Jens Axboe <axboe@...nel.dk>, Jackie Liu <liuyun01@...inos.cn>,
Alexander Viro <viro@...iv.linux.org.uk>,
linux-block@...r.kernel.org,
"David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Cc: John Crispin <john@...ozen.org>, Felix Fietkau <nbd@....name>
Subject: NAT performance regression caused by the a2d79c7174ae ("Merge tag
'for-5.3/io_uring-20190711' of git://git.kernel.dk/linux-block")
Hi,
I use Linux on home routers based on Broadcom's Northstar SoCs. Those
devices have ARM Cortex-A9 and most of them are dual-core. That CPU
isn't powerful enough to handle gigabit traffic so all kind of
optimizations really matter.
After switching from kernel 5.2 to 5.3 I noticed a NAT performance
regression (down from 805 Mb/s to 775 Mb/s). I tracked it down to the
a2d79c7174ae ("Merge tag 'for-5.3/io_uring-20190711' of
git://git.kernel.dk/linux-block").
That issue is also present in the v5.3.8.
Above merge adds 6 commits. Starting from the newest:
a4c0b3decb33 io_uring: fix io_sq_thread_stop running in front of io_sq_thread
aa1fa28fc73e io_uring: add support for recvmsg()
0fa03c624d8f io_uring: add support for sendmsg()
9e645e1105ca io_uring: add support for sqe links
9d93a3f5a0c0 io_uring: punt short reads to async context
87e5e6dab6c2 uio: make import_iovec()/compat_import_iovec() return bytes on success
I tested them one by one looking for performance drops.
1) 9d93a3f5a0c0 ("io_uring: punt short reads to async context")
NAT speed drop from 805 Mb/s to 791 Mb/s
2) 9e645e1105ca ("io_uring: add support for sqe links")
NAT speed drop from 791 Mb/s to 782 Mb/s
3) a4c0b3decb33 ("io_uring: fix io_sq_thread_stop running in front of io_sq_thread")
NAT speed drop from 782 Mb/s to 775 Mb/s
Do you have any idea why those changes affected my NAT performance and
if that can be fixed somehow?
I tried running "perf" + difffolded.pl before & after that a2d79c7174ae
(svg attached) but I don't see too much there. It's mostly just:
+0.25% __do_softirq
+0.20% [[xt_conntrack]]
It may be that running "perf" while doing NAT traffic affected the
results (NAT was much slower).
My test hardware is BCM47094 SoC (dual core ARM) with integrated network
controller and external BCM53012 switch.
Relevant setup:
* SoC network controller is wired to the hardware switch
* Switch passes 802.1q frames with VID 1 to four LAN ports
* Switch passes 802.1q frames with VID 2 to WAN port
* Linux does NAT between LAN (eth0.1) and WAN (eth0.2)
* I use "pfifo", "echo 2 > rps_cpus" and iperf for testing
Download attachment "a2d79c7174ae.svg" of type "image/svg+xml" (157765 bytes)
Powered by blists - more mailing lists