lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 13 Aug 2030 16:50:42 +0800
From:	"Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Dhaval Giani <dhaval@...ux.vnet.ibm.com>,
	Ingo Molnar <mingo@...e.hu>,
	LKML <linux-kernel@...r.kernel.org>,
	Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>,
	Aneesh Kumar KV <aneesh.kumar@...ux.vnet.ibm.com>,
	Balbir Singh <balbir@...ibm.com>
Subject: Re: VolanoMark regression with 2.6.27-rc1


On Fri, 2008-08-08 at 09:30 +0200, Peter Zijlstra wrote:
> On Tue, 2030-08-06 at 11:26 +0800, Zhang, Yanmin wrote:
> > On Mon, 2008-08-04 at 09:12 +0200, Peter Zijlstra wrote:
> > > On Mon, 2008-08-04 at 12:35 +0530, Dhaval Giani wrote:
> > > > On Mon, Aug 04, 2008 at 08:26:11AM +0200, Peter Zijlstra wrote:
> > > > > On Mon, 2008-08-04 at 11:23 +0530, Dhaval Giani wrote:
> > > > > 
> > > > > > Peter, vatsa, any ideas?
> > > > > 
> > > > > ---
> > > > > 
> > > > > Revert:
> > > > >   a7be37ac8e1565e00880531f4e2aff421a21c803  sched: revert the revert of: weight calculations
> > > > >   c9c294a630e28eec5f2865f028ecfc58d45c0a5a  sched: fix calc_delta_asym()
> > > > >   ced8aa16e1db55c33c507174c1b1f9e107445865  sched: fix calc_delta_asym, #2
> > > > > 
> > > > 
> > > > Did we not fix those? :) 
> > > 
> > > Works for me,.. just guessing here.
> > I did more investigation on 16-core tigerton.
> > 
> > Firstly, let's focus on CONFIG_GROUP_SCHED=n. With 2.6.26, the result
> > has little difference
> > between with and without CONFIG_GROUP_SCHED.
> > 
> > 1) I tried different sched_features and found AFFINE_WAKEUPS has big
> > impact on volanoMark. Other
> > features have little impact.
> > 
> > 2) With kernel 2.6.26, if disabling AFFINE_WAKEUPS, the result is
> > 260000; if enabling AFFINE_WAKEUPS,
> > the result is 515000, so the improvement caused by AFFINE_WAKEUPS is
> > about 100%. With kernel 2.6.27-rc1,
> > the improvement is only about 25%.
> > 
> > 3) I turned on CONFIG_SCHETSTATS in kernel and collect
> > ttwu_move_affine. Mostly, collect ttwu_move_affine,
> > then recollect it after 30 seconds and calculate the difference. With
> > 2.6.26, I got below data:
> 
> <snip data>
> 
> > So with kernel 2.6.27-rc1, the successful wakeup_affine is about
> > double of the one of 2.6.27-rc1
> > on domain 0, but about 10 times on domain 1. That means more tasks are
> > woken up on waker cpus.
> > 
> > Does that mean it doesn't follow cache-hot checking?
> 
> I'm a bit puzzled, but you're right - I too noticed that volanomark is
> _very_ sensitive to affine wakeups.
> 
> I'll try and find what changed in that code for GROUP=n.
I collect more data and find CPU_NEWLY_IDLE balance schedstat looks abnormal.
Comparing with 2.6.26, 2.6.27-rc1 has more successful move_tasks among cpu runqueue. I
instrument kernel and find that, with 2.6.26, mostly task is hot when kernel tries to
move it to another cpu. But with 2.6.27-rc1, task is often moved successfully.
If I set /proc/sys/kernel/sched_migration_cost=1500000 (default is 500000), volanoMark
result is improved significantly, near to the result of 2.6.26. Above testing set
CONFIG_GROUP_SCHED=n. So perhaps some key data structures are changed with 2.6.27-rc1
to create more cache misses. With 2.6.26, cpu idle is about 6~7%. With 2.6.27-rc1, cpu idle
is about 1%. I compare the 2 kernels and couldn't find what data structure change makes it.

As for CONFIG_GROUP_SCHED=y, oprofile shows tg_shares_up consumes about 8% cpu utilization
on my 16-core tigerton. If I enlarge /proc/sys/kernel/sched_shares_ratelimit, it doesn't help
volanoMark result. I check the group schedule codes and got an idea to improve it. Add
share_percent, a new var in task_group->sched_entity[i] to record the percent this task group
occupies in the parent group. share_percent is updated in walk_tg_tree. In account_entity_enqueue,
if the task entity has parent, we could just use share_percent and se->load.weight to calculate
a new weight and add the new weight to parent entity weight, in the end to runqueue load weight.
So when sched_shares_ratelimit is enlarged, various load balances still could work well. I think
volanoMark could benefit from it.

BTW, with CONFIG_GROUP_SCHED=y, hackbench has about 80% regression on my 8core+multi_thread
Montvale Itanium machine and Tulsa machines. It seems mutli-thread machines has the regression.

-yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ