lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <3244031.uQGDddGTLF@h2o.as.studentenwerk.mhn.de>
Date:	Tue, 01 Oct 2013 18:39:32 +0200
From:	Wolfgang Walter <linux@...m.de>
To:	netdev@...r.kernel.org
Subject: Big performance loss from 3.4.63 to 3.10.13 when routing ipv4

Hello,

I tried to upgrade one of our routers to 3.10.13 from 3.4.63 and I see a 
dramatic performance loss. I tried 3.11.2 and it is still there.

*** Symptoms:

All network traffic over the router become slow and sluggish. If one pings the 
router there is a packet loss. After about 2 minutes the traffic completely 
stalls for about 1 minute. Then it works again as in the beginning to then 
stall again. And so on.

This happens even with rather moderate traffic. While still routing the CPU 
utilization is higher than it is with 3.4.63 but only moderately.

When it stalls no network traffic seems possible (but to loopback). If one 
tries to ping from the router any target (even if it is on a interface with no 
traffic at all) one gets:

	ping: sendmsg: No buffer space available

As the router has about 15G free memory this probably means that an internal 
table is full.

The CPU-utilization is low within that period.


I can trigger it easily when I copy about 50 big files per scp over 50 
different ipsec-tunnels:

* boot router

* wait until all ipsec tunnels are established

* start copying:

H <--1G--> Router <---1G--->.......<-- >=100MBit --> Xn <---100Mbit----> Rn

So there is a ipsec tunnel between Router and Xn for all n=1 to 50. I copy 
files from Rn to H. I start the copy from H, so the tcp-connections get 
established from H to Rn.

The same test works just fine with 3.4.63. All cores are used but no one 
reaches its limit. The router does neither drop pings nor does it have 
problems pinging other targets.

I tested 3.8.13 It seems not to have this issue if I increase

	net.ipv4.inet_peer_threshold

(I tried 6566400, didn't try smaller values beside the default one).

If I use the default one 3.8.13 behaves badly.

But 3.8.13 seems to have other issues. Basically: routing stalls later much 
longer (up to 6 minutes or so).



*** Environment:

It's a 8 core machine (with AES-NI). It establishes a lot of ipsec-tunnels. It 
uses statefull packet filtering (but no NAT). The network-cards are intel 
cards (driver: igb and ixgbe). No IPv6. No ethernet flow control enabled (but 
doesn't matter). No traffic shaping (that is tc). igb/ixgbe interfaces: 
nothing modified with ethtool but flow control (autoneg off tx off rx off).


Any idea?


Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ