lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <5781211.jtEvgqSZyO@stwm.de>
Date:   Thu, 13 Sep 2018 13:30:12 +0200
From:   Wolfgang Walter <linux@...m.de>
To:     netdev@...r.kernel.org
Cc:     Florian Westphal <fw@...len.de>,
        Steffen Klassert <steffen.klassert@...unet.com>,
        David Miller <davem@...hat.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Regression: kernel 4.14 an later very slow with many ipsec tunnels

Hello,

thanks to the fix from Steffen Klassert I could now run 4.14.69 + his patch 
and 4.18.7 + his patch without oopsing immediately.

But I found that those kernels perform very bad. They perform so bad that they 
are unusable for our router with about 3000 ipsec tunnels (tunnel mode network 
<-> network).

With 4.9. (and all other kernels I used in the last 10 years with much less 
potent hardware) I never had an comparable performance issue with networking.

4.18.7 is better then 4.14.69 but still remains unusuable for us.

Even with very little traffic all 8 cores are working 100% in ksoftirqd. As 
soon as there is real traffic network gets rather unusuable.

Latency of packets goes from between 0.1ms to 1ms up to 100ms to 500ms (4.14) 
or between 15ms to 90ms (4.18).

Throughput also suffers a lot.

I have a simple test I run after every upgrade. This test basically copies 
with scp large files to 60 different remote locations (involving ipsec), 
limited to 1GBit/s combined, and in paralled I ping from different networks 
over this router to machines in other networks of this router (no ipsec-
tunnels involved).

With 4.9 and earlier copying needs about 2 minutes and the pings all remain 
under 2ms roundtrip. 

With 4.14 copying these files needs more than one our. The roundtrip time of 
the ping is > 1 second.

With 4.18 this is much better, copying needs around 6 minutes and ping 
roundtrip is between 30ms and 180ms. But even that is much worse then 4.9.

I think this dramatic loss in performance is due to the removal of the flow 
cache. I propose to revert that for 4.14. I also propose to revert it for the 
next longterm kernel if no other solution is found bringing back 4.9 
performance (at least about the same order of magnitude).

Maybe it should generally be reverted until a solution to the problem exists.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ