netdev - Spinlock spinning in __inet_hash

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKF7Hnf_EyF38OEKk2rJEvEvn+eftJ4CaHN7-SszqbZius81yg@mail.gmail.com>
Date:	Wed, 6 Mar 2013 10:52:05 +0100
From:	Johannes Rudolph <johannes.rudolph@...glemail.com>
To:	netdev@...r.kernel.org
Subject: Spinlock spinning in __inet_hash_connect

Hello all,

I hope I'm on the correct mailing list for raising this issue. We are
seeing an issue while running a load test with jmeter against a web
server [1]. The test suite uses 50 threads to connect to a localhost
web server, runs one http request per connection and then loops. What
happens is that after the test runs for about 10 seconds (~ 100000
connections established / closed) the CPU load goes up and connection
rates slow down massively (see [1] for a chart). With `perf top` I'm
observing this on the _client_ side:

 41.39%  [kernel]                                    [k] __ticket_spin_lock
 16.83%  [kernel]                                    [k]
__inet_check_established
 12.50%  [kernel]                                    [k] __inet_hash_connect
  4.35%  [kernel]                                    [k] __ticket_spin_unlock

I've also recorded a call graph a log of which you can find in [2].
This was on Ubuntu 12.10 Linux 3.6.3-030603-generic x86_64. The same
test run against another webserver doesn't show this behavior under
the particular setup.

I've found a related issue for totally different application and setup
[3]. The problem seems related to handing out ephemeral ports when
there are only few ephemeral ports available and (I guess) there's
much congestion on the ephemeral ports hashtable. As suggested in [3]
setting `tcp_tw_reuse=1` seems to fix the issue in the particular test
case but it could be that it is only because it takes pressure from
the ports available.

Before doing more research I wanted to put that here for the record
and for suggestions how to proceed further. What I could do:

 * run the test on a more recent kernel (3.8.2)
 * provide you with instructions how to reproduce the behavior
 * upload the `perf report` if that helps

Thanks,

--
Johannes

[1] https://groups.google.com/d/topic/spray-user/76klWTHtsr4/discussion
[2] https://gist.github.com/jrudolph/5098113
[3] https://bugs.launchpad.net/percona-playback/+bug/1059330

-----------------------------------------------
Johannes Rudolph
http://virtual-void.net
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html