[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <1219981101.8781.123.camel@ymzhang>
Date: Fri, 29 Aug 2008 11:38:21 +0800
From: "Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>
To: Ingo Molnar <mingo@...e.hu>
Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Dhaval Giani <dhaval@...ux.vnet.ibm.com>,
LKML <linux-kernel@...r.kernel.org>,
Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>,
Aneesh Kumar KV <aneesh.kumar@...ux.vnet.ibm.com>,
Balbir Singh <balbir@...ibm.com>,
Chris Friesen <cfriesen@...tel.com>
Subject: Re: VolanoMark regression with 2.6.27-rc1
On Fri, 2008-08-29 at 11:35 +0800, Zhang, Yanmin wrote:
> On Thu, 2008-08-21 at 14:48 +0800, Zhang, Yanmin wrote:
> > On Thu, 2008-08-21 at 08:16 +0200, Ingo Molnar wrote:
> > > * Zhang, Yanmin <yanmin_zhang@...ux.intel.com> wrote:
> > >
> > > > > ok, i've applied this one to tip/sched/urgent instead of the
> > > > > feature-disabling patchlet. Yanmin, could you please check whether this
> > > > > one does the trick?
> > > >
> > > > This new patch almost doesn't help volanoMark. Pls. use the patch
> > > > which sets LB_BIAS=1 by default.
> > >
> > > ok. That also removes the kernel.h complications ;-)
> > Sorry, I have new update.
> > Originally, I worked on 2.6.27-rc1. I just move to 2.6.27-rc3 and found
> > something defferent when CONFIG_GROUP_SCHED=n.
> >
> > With 2.6.27-rc3, on my 8-core stoakley, all volanoMark regression disappears,
> > no matter if I enable LB_BIAS. On 16-core tigerton, the regression is still
> > there if I don't enable LB_BIAS and regression becomes 11% from 65%.
> I have new updates on this regression. I checked volanoMark web page and
> found the client command line has option rooms and users. rooms means how many
> chat room will be started. users means how many users are in 1 room. The default
> rooms is 10 and users is 20, so every room has about 800 threads.
Sorry. every room has 80 threads.
> As all threads of a
> room just communicate within this room, so the rooms number is important.
>
> All my previous volanoMark testing uses default rooms 10 and users 20. With wake_offine
> in kernel, waker/sleeper will be moved to the same cpu gradually. However, if the
> rooms is not multiple of cpu number, due to load balance, kernel will move threads from
> one cpu to another cpu continually. If there are too many threads to weaken the cache-hot
> effect, load balance is more important. But if there are not too many threads running,
> cache-hot is more important than load balance. Should we prefer to wake_affine more?
>
> Below is some data I collected with numerous testing on 3 machines.
>
>
> On 2-quadcore processor stoakley (8-core):
> kernel\rooms | 8 | 10 | 16 | 32
> -------------------------------------------------------------------------------------------
> 2.6.26_nogroup | 385617 | 351247 | 323324 | 231934
> -------------------------------------------------------------------------------------------
> 2.6.27-rc4_nogroup | 359124 | 336984 | 335180 | 235258
> -------------------------------------------------------------------------------------------
> 2.6.26group | 381425 | 343636 | 312280 | 179673
> -------------------------------------------------------------------------------------------
> 2.6.27-rc4group | 212112 | 270000 | 300188 | 228465
> -------------------------------------------------------------------------------------------
>
>
> On 2-quadcore+HT processor new x86_64 (8-core+HT, total 16 threads):
> kernel\rooms | 10 | 16 | 24 | 32 | 64
> -------------------------------------------------------------------------
> 2.6.26_nogroup | 667668 | 671860 | 671662 | 621900 | 509482
> -------------------------------------------------------------------------
> 2.6.27-rc4_nogroup | 732346 | 800290 | 709272 | 648561 | 497243
> -------------------------------------------------------------------------
> 2.6.26group | 705579 | 759464 | 693697 | 636019 | 500744
> -------------------------------------------------------------------------
> 2.6.27-rc4group | 572426 | 674977 | 627410 | 590984 | 445651
> -------------------------------------------------------------------------
>
>
> On 4-quadcore tigerton processor(16-core)(32 rooms testing isn't stable on the machine, so no 32):
> kernel\rooms | 8 | 10 | 16
> ------------------------------------------------------------------
> 2.6.26_nogroup | 346410 | 382938 | 349405
> ------------------------------------------------------------------
> 2.6.27-rc4_nogroup | 359124 | 336984 | 335180
> ------------------------------------------------------------------
> 2.6.26group | 504802 | 376513 | 319020
> ------------------------------------------------------------------
> 2.6.27-rc4group | 247652 | 284784 | 355132
> ------------------------------------------------------------------
>
> I also tried different users with rooms 8 and found the results of users 20/40/60 are very close.
>
> With group scheduing, mostly, 2.6.26 is better than 2.6.27-rc4.
> Without group scheduling, the result depends on specific machine.
>
> I also rerun hackbench with group 10/16/32, and found the result difference between 2 kernels
> varies among group 10/16/32.
>
> What's the most reasonable group/rooms we should use to test?
>
> In the other hand, tbench(start CPU_NUM*2 clients) has about 4~5% regression with 2.6.27-rc kernels.
> With 30second schedstat data during the testing, I found there is almost no wake remote and wake
> affine with 2.6.26, but there are many either wake_affine or wake remote with 2.6.27-rc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists