[<prev] [next>] [day] [month] [year] [list]
Message-Id: <6.2.5.6.2.20111004214105.03a9d9f0@binnacle.cx>
Date: Tue, 04 Oct 2011 23:35:11 -0400
From: starlight@...nacle.cx
To: Joe Perches <joe@...ches.com>, Christoph Lameter <cl@...two.org>,
Serge Belyshev <belyshev@...ni.sinp.msu.ru>,
Con Kolivas <kernel@...ivas.org>
Cc: Eric Dumazet <eric.dumazet@...il.com>,
linux-kernel@...r.kernel.org, netdev <netdev@...r.kernel.org>,
Willy Tarreau <w@....eu>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Stephen Hemminger <stephen.hemminger@...tta.com>
Subject: Re: big picture UDP/IP performance question re 2.6.18
-> 2.6.32
At 12:38 PM 10/4/2011 -0700, Joe Perches wrote:
>On Tue, 2011-10-04 at 14:16 -0500, Christoph Lameter wrote:
>> On Mon, 3 Oct 2011, starlight@...nacle.cx wrote:
>> > I've come to the conclusion that Eric is right
>> > and the primary issue is an increase in the
>> > cost of scheduler context switches. Have
>> > been watching this number and it has held
>> > pretty close to 200k/sec under all scenarios
>> > and kernel versions, so it has to be
>> > a longer code-path, bigger cache pressure
>> > or both in the scheduler. Sadly this makes
>> > newer kernels a no-go for us.
>>
>> We had similar experiences. Basically latency
>> constantly gets screwed up by the new fancy
>> features being added to the scheduler and network
>> subsystem (most notorious is the new "fair"
>> scheduler, 2.6.23 made a big step down).
>
>Idly curious, have you compared bfs performance?
>http://ck.kolivas.org/patches/bfs/bfs-faq.txt
Gave it a try. The application runs with a
large thread pool and the O(N) aspect of
BFS does not like that. CPU was 30-50%
higher. Wouldn't fly in production.
HOWEVER, when the application is run in its
small thread-pool mode total CPU consumption
with the BFS patched 2.6.39.4 kernel was just
5% greater than with the 2.6.18(rhel5) kernel.
This result is certainly interesting and
suggests that the more complex scheduler in
the newer kernels is to blame for the loss
of performance.
The picture is not perfectly clear as the reported
balance of user and system CPU is dramatically
different. In the 2.6.18(rhel5) kernel the
percent of CPU attributed to system overhead
is 23% and the IRQ rate was 33k with a context
switch rate of about 200k. With the
2.6.39.4(bfs) kernel system overhead was
reported as just 3% of the application total,
with user time exceeding the 2.6.18(rhel5)
total of both user and system. IRQs ran
at a much higher 160k and context switches
at a significantly lower 110k. The above user/sys
numbers were calculated from the /proc/<pid>/stat
values for the application. The number
of active threads was 13 and the number
of cores was 12.
Another observation is that the CPU numbers
reported by 'top' and 'vmstat' diverge
more than usual. 'top' steadily reported
600ish percent (of the 1200% possible) while
'vmstat' was showing numbers all over the map
including near zero utilization. Apparently
an extreme case of clock-tick histogram
measurement aliasing effects.
I'm attaching the 'perf' report for this run.
Should provide some interesting comparisons
with the 'perf' from the unmodified 2.6.39.4
kernel test run earlier in this thread, though
note that that test was run with the large
thread pool with about 140 active threads.
The config is attached as well.
-----
Anyway this is academic for us and the
production kernel will be 2.6.18(rhel5) for
some time.
When core counts start getting up into the
hundreds or thousands the newer-kernel doubling
of system overhead probably won't matter if
that's what it takes to make it run well.
Will be interesting to see if the Tile-GX
100-core CPU shows up soon and works even
remotely as well as they say it will.
View attachment "perf_report_bfs_pool0.txt" of type "text/plain" (334784 bytes)
View attachment "config_2.6.39.4_bfs.txt" of type "text/plain" (111484 bytes)
Powered by blists - more mailing lists