linux-kernel - Re: big picture UDP/IP performance question re 2.6.18 -> 2.6.32

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <6.2.5.6.2.20111003112108.03a83a28@binnacle.cx>
Date:	Mon, 03 Oct 2011 11:25:31 -0400
From:	starlight@...nacle.cx
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	linux-kernel@...r.kernel.org, netdev <netdev@...r.kernel.org>,
	Willy Tarreau <w@....eu>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Stephen Hemminger <stephen.hemminger@...tta.com>
Subject: Re: big picture UDP/IP performance question re 2.6.18
  -> 2.6.32

Ran 'nohalt' and 'poll=idle' tests and they
were a complete bust.  'nohalt' gave slightly
better performance with 2.6.18(rhel) and
slightly worse performance with 2.6.39.27.
'poll=idle' improved 2.6.39.27 by 2.7%,
which is nowhere near good enough to have
servers running at full power--about double
idle power according to the UPS.  'poll=idle'
did cause the idle-loop to show up in 'perf'
at the top of the list consuming 45% of the
CPU.  So the one thing it does is permit one
to see exactly how much absolute CPU is
consumed by each component during test runs
rather than providing a relative number.

I've come to the conclusion that Eric is right
and the primary issue is an increase in the
cost of scheduler context switches.  Have
been watching this number and it has held
pretty close to 200k/sec under all scenarios
and kernel versions, so it has to be
a longer code-path, bigger cache pressure
or both in the scheduler.  Sadly this makes
newer kernels a no-go for us.

At 07:47 AM 10/2/2011 -0700, Stephen Hemminger wrote:
>Try disabling PCI DMA remapping. The additional
>overhead of setting up IO mapping can be a
>performance buzz kill. Try CONGIG_DMAR=n

I decided to skip this.  Can't see how it
will make more than a percent or two
difference since the problem is not the
network stack.

At 09:21 AM 10/2/2011 +0200, Eric Dumazet wrote:
>On new kernels, you can check if your udp sockets
>drops frames because of rcvbuffer being full (cat
>/proc/net/udp, check last column 'drops')

Zero, of course.  This info is also
shown lumped in the 'netstat -s'
"packet receive errors" counter.
Have been watching it all along.

>To check if softirq processing hit some limits :
>cat /proc/net/softnet_stat

Zero drops.  Some noise-level squeezing.

Nice stat!  Wish I had known about this
four or five years ago.

>Please send full "dmesg" output 

Attached.

View attachment "dmesg_c6_263904.txt" of type "text/plain" (65130 bytes)