[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4921539B.2000002@cosmosbay.com>
Date: Mon, 17 Nov 2008 12:20:59 +0100
From: Eric Dumazet <dada1@...mosbay.com>
To: Ingo Molnar <mingo@...e.hu>
CC: David Miller <davem@...emloft.net>, rjw@...k.pl,
linux-kernel@...r.kernel.org, kernel-testers@...r.kernel.org,
cl@...ux-foundation.org, efault@....de, a.p.zijlstra@...llo.nl,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [Bug #11308] tbench regression on each kernel release from 2.6.22
-> 2.6.28
Ingo Molnar a écrit :
> * David Miller <davem@...emloft.net> wrote:
>
>> From: Ingo Molnar <mingo@...e.hu>
>> Date: Mon, 17 Nov 2008 10:06:48 +0100
>>
>>> * Rafael J. Wysocki <rjw@...k.pl> wrote:
>>>
>>>> This message has been generated automatically as a part of a report
>>>> of regressions introduced between 2.6.26 and 2.6.27.
>>>>
>>>> The following bug entry is on the current list of known regressions
>>>> introduced between 2.6.26 and 2.6.27. Please verify if it still should
>>>> be listed and let me know (either way).
>>>>
>>>>
>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11308
>>>> Subject : tbench regression on each kernel release from 2.6.22 -> 2.6.28
>>>> Submitter : Christoph Lameter <cl@...ux-foundation.org>
>>>> Date : 2008-08-11 18:36 (98 days old)
>>>> References : http://marc.info/?l=linux-kernel&m=121847986119495&w=4
>>>> http://marc.info/?l=linux-kernel&m=122125737421332&w=4
>>> Christoph, as per the recent analysis of Mike:
>>>
>>> http://fixunix.com/kernel/556867-regression-benchmark-throughput-loss-a622cf6-f7160c7-pull.html
>>>
>>> all scheduler components of this regression have been eliminated.
>>>
>>> In fact his numbers show that scheduler speedups since 2.6.22 have
>>> offset and hidden most other sources of tbench regression. (i.e. the
>>> scheduler portion got 5% faster, hence it was able to offset a
>>> slowdown of 5% in other areas of the kernel that tbench triggers)
>> Although I respect the improvements, wake_up() is still several
>> orders of magnitude slower than it was in 2.6.22 and wake_up() is at
>> the top of the profiles in tbench runs.
>
> hm, several orders of magnitude slower? That contradicts Mike's
> numbers and my own numbers and profiles as well: see below.
>
> The scheduler's overhead barely even registers on a 16-way x86 system
> i'm running tbench on. Here's the NMI profile during 64 threads tbench
> on a 16-way x86 box with an v2.6.28-rc5 kernel [config attached]:
>
> Throughput 3437.65 MB/sec 64 procs
> ==================================
> 21570252 total
> ........
> 1494803 copy_user_generic_string
> 998232 sock_rfree
> 491471 tcp_ack
> 482405 ip_dont_fragment
> 470685 ip_local_deliver
> 436325 constant_test_bit [ called by napi_disable_pending() ]
> 375469 avc_has_perm_noaudit
> 347663 tcp_sendmsg
> 310383 tcp_recvmsg
> 300412 __inet_lookup_established
> 294377 system_call
> 286603 tcp_transmit_skb
> 251782 selinux_ip_postroute
> 236028 tcp_current_mss
> 235631 schedule
> 234013 netif_rx
> 229854 _local_bh_enable_ip
> 219501 tcp_v4_rcv
>
> [ etc. - see full profile attached further below ]
>
> Note that the scheduler does not even show up in the profile up to
> entry #15!
>
> I've also summarized NMI profiler output by major subsystems:
>
> NET overhead (12603450/21570252): 58.43%
> security overhead ( 1903598/21570252): 8.83%
> usercopy overhead ( 1753617/21570252): 8.13%
> sched overhead ( 1599406/21570252): 7.41%
> syscall overhead ( 560487/21570252): 2.60%
> IRQ overhead ( 555439/21570252): 2.58%
> slab overhead ( 492421/21570252): 2.28%
> timer overhead ( 226573/21570252): 1.05%
> pagealloc overhead ( 192681/21570252): 0.89%
> PID overhead ( 115123/21570252): 0.53%
> VFS overhead ( 107926/21570252): 0.50%
> pagecache overhead ( 62552/21570252): 0.29%
> gtod overhead ( 38651/21570252): 0.18%
> IDLE overhead ( 0/21570252): 0.00%
> ---------------------------------------------------------
> left ( 1349494/21570252): 6.26%
>
> The scheduler's functions are absolutely flat, and consistent with an
> extreme context-switching rate of 1.35 million per second. The
> scheduler can go up to about 20 million context switches per second on
> this system:
>
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r b swpd free buff cache si so bi bo in cs us sy id wa st
> 32 0 0 32229696 29308 649880 0 0 0 0 164135 20026853 24 76 0 0 0
> 32 0 0 32229752 29308 649880 0 0 0 0 164203 20032770 24 76 0 0 0
> 32 0 0 32229752 29308 649880 0 0 0 0 164201 20036492 25 75 0 0 0
>
> ... and 7% scheduling overhead is roughly consistent with 1.35/20.0.
>
> Wake up affinities and data flow caching is just fine in this workload
> - we've got scheduler statistics for that and they look good too.
>
> It all looks like pure old-fashioned straight overhead in the
> networking layer to me. Do we still touch the same global cacheline
> for every localhost packet we process? Anything like that would show
> up big time.
Yes we do, I find strange we dont see dst_release() in your NMI profile
I posted a patch ( commit 5635c10d976716ef47ae441998aeae144c7e7387
net: make sure struct dst_entry refcount is aligned on 64 bytes)
(in net-next-2.6 tree)
to properly align struct dst_entry refcounter and got 4% speedup on tbench on my machine.
Small speedups too with commit ef711cf1d156428d4c2911b8c86c6ce90519dc45
(net: speedup dst_release())
Also on net-next-2.6, patches avoid dirtying last_rx on netdevices (loopback for example)
, it helps a lot tbench too.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists