lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 14 May 2008 17:22:53 +0800
From:	"Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Mike Galbraith <efault@....de>, vatsa@...ux.vnet.ibm.com,
	Dhaval Giani <dhaval@...ux.vnet.ibm.com>,
	LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
	Aneesh Kumar KV <aneesh.kumar@...ux.vnet.ibm.com>
Subject: Re: volanoMark regression with kernel 2.6.26-rc1


On Mon, 2008-05-12 at 11:20 +0200, Peter Zijlstra wrote:
> On Mon, 2008-05-12 at 11:04 +0200, Mike Galbraith wrote:
> > On Mon, 2008-05-12 at 13:02 +0800, Zhang, Yanmin wrote:
> > 
> > > A quick update:
> > > With 2.6.26-rc2 (CONFIG_USER_SCHED=y), volanoMark result on my 8-core stoakley
> > > is about 10% worse than the one of 2.6.26-rc1.
> > 
> > Here (Q6600), 2.6.26-rc2 CONFIG_USER_SCHED=y regression culprit for
> > volanomark is the same one identified for mysql+oltp.
> > 
> > (i have yet to figure out where the buglet lies, but there is definitely
> > one in there somewhere)
> > 
> Yeah, I expect that when you create some groups and move everything down
> 1 level you'll get into the same problems as with user grouping.
> 
> The thing seems to be that rq weights shrink to < 1 task level in these
> situations - because its spreading 1 tasks (well group) worth of load
> over the various CPUs.
> 
> We're going through the load balance code atm to find out where the
> small load numbers would affect decisions.
> 
> It looks like things like find_busiest_group() just think everything is
> peachy when the imbalance is < 1 task - which with all this grouping
> stuff is not necessarily true.
In case I might mislead you on the find_busiest_group path, I did more testing
and collected data on both hackbench and volanoMark.

I reran hackbench against 2.6.25, 2.6.26-rc2 and 2.6.26-rc2+slub_reverse, because
2.6.26-rc includes Christoph's handling multi page-size slub patch which could improve
hackbench. The testing machine is 8-core stoakley.

All kernel are compiled with options:
CONFIG_LOG_BUF_SHIFT=17
# CONFIG_CGROUPS is not set
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_GROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_RT_GROUP_SCHED is not set
CONFIG_USER_SCHED=y
# CONFIG_CGROUP_SCHED is not set
CONFIG_SYSFS_DEPRECATED=y

		| hackbench 100 process 2000	| hackbench 100 process 10000
-------------------------------------------------------------------------------
2.6.25		|	35seconds		|	182second
-------------------------------------------------------------------------------
2.6.26-rc2      |	28.5seconds		 |	 140second
-------------------------------------------------------------------------------
2.6.26-rc2	 |				 |
+reverse_slub 	|	32seconds		|	160second
-------------------------------------------------------------------------------

So if we don't consider SLUB patch improvement, 2.6.26-rc2 still has some improvement
on hackbench. Not sure if the improvement is related to scheduler.


Then, I collected the schedule caller information with volanoMark testing. Data
is collected for 20 seconds during the testing.

Below is the gprof output with kernel 2.6.25 using above config option.
                0.00    0.00    2962/19804016     retint_careful [16339]
                0.00    0.00    3234/19804016     sys_rt_sigsuspend [20024]
                0.00    0.00    4960/19804016     lock_sock_nested [11240]
                0.00    0.00    8957/19804016     sysret_careful [20253]
                0.00    0.00   28507/19804016     cpu_idle [4340]
                0.00    0.00 2137406/19804016     futex_wait [8065]
                0.00    0.00 4400980/19804016     schedule_timeout [2]
                0.00    0.00 13213237/19804016     sys_sched_yield [20035]
[1]      0.0    0.00    0.00 19804016         schedule [1]
-----------------------------------------------
                0.00    0.00       1/4400980     cifs_oplock_thread [3727]
                0.00    0.00       2/4400980     cifs_dnotify_thread [3700]
                0.00    0.00       2/4400980     inet_csk_accept [9461]
                0.00    0.00      29/4400980     do_select [5468]
                0.00    0.00 4400946/4400980     sk_wait_data [18983]
[2]      0.0    0.00    0.00 4400980         schedule_timeout [2]
                0.00    0.00 4400980/19804016     schedule [1]


Below is the gprof output with kernel 2.6.26-rc2 using above config option.
                0.00    0.00    3035/12423442     sys_rt_sigsuspend [20387]
                0.00    0.00    7862/12423442     lock_sock_nested [11424]
                0.00    0.00   31105/12423442     __cond_resched [23242]
                0.00    0.00  135653/12423442     retint_careful [16627]
                0.00    0.00  180994/12423442     cpu_idle [4411]
                0.00    0.00  506419/12423442     sysret_careful [20620]
                0.00    0.00 1657696/12423442     futex_wait [8211]
                0.00    0.00 3062197/12423442     schedule_timeout [2]
                0.00    0.00 6836914/12423442     sys_sched_yield [20398]
[1]      0.0    0.00    0.00 12423442         schedule [1]
-----------------------------------------------
                0.00    0.00       1/3062197     cifs_dnotify_thread [3781]
                0.00    0.00       2/3062197     sk_stream_wait_memory [19336]
                0.00    0.00      29/3062197     do_select [5561]
                0.00    0.00 3062165/3062197     sk_wait_data [19338]
[2]      0.0    0.00    0.00 3062197         schedule_timeout [2]
                0.00    0.00 3062197/12423442     schedule [1]


So with kernel 2.6.25, about 66% calling of schedule is from sys_sched_yield,
but only 55% calling of schedule is from sys_sched_yield with kernel 2.6.26-rc2.
sysret_careful/retint_careful times mean non-voluntary schedule times. 2.6.25's
non-voluntary schedule is far less than the one of 2.6.26-rc2.


Below is the gprof output with kernel 2.6.26-rc2(CONFIG_GROUP_SCHED=y,CONFIG_CGROUP_SCHED=y).
                0.00    0.00    2519/20999187     retint_careful [16704]
                0.00    0.00    5899/20999187     lock_sock_nested [11494]
                0.00    0.00   27059/20999187     sysret_careful [20697]
                0.00    0.00   73569/20999187     cpu_idle [4473]
                0.00    0.00 2360268/20999187     futex_wait [8275]
                0.00    0.00 4755337/20999187     schedule_timeout [2]
                0.00    0.00 13769085/20999187     sys_sched_yield [20475]
[1]      0.0    0.00    0.00 20999187         schedule [1]
-----------------------------------------------
                0.00    0.00       1/4755337     cifs_dnotify_thread [3837]
                0.00    0.00       2/4755337     inet_csk_accept [9697]
                0.00    0.00      31/4755337     do_select [5624]
                0.00    0.00 4755303/4755337     sk_wait_data [19414]
[2]      0.0    0.00    0.00 4755337         schedule_timeout [2]
                0.00    0.00 4755337/20999187     schedule [1]
-----------------------------------------------

volanoMark need /proc/sys/kernel/sched_compat_yield=1.

Perhaps above info might provide some clues? either 2.6.26-rc2 change has some impact on
sys_sched_yield?

yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ