linux-kernel - Re: sched: Avoid SMT siblings in select_idle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120326173533.GA4689@linux.vnet.ibm.com>
Date:	Mon, 26 Mar 2012 23:05:33 +0530
From:	Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Ingo Molnar <mingo@...e.hu>, Mike Galbraith <efault@....de>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Paul Turner <pjt@...gle.com>
Subject: Re: sched: Avoid SMT siblings in select_idle_sibling() if possible

* Peter Zijlstra <peterz@...radead.org> [2012-03-26 10:36:00]:

> >                 tip     tip + patch 
> > 
> > volano          1       1.29   (29% improvement)
> > sysbench [n3]   1       2      (100% improvement)
> > tbench 1 [n4]   1       1.07   (7% improvement)
> > tbench 8 [n5]   1       1.26   (26% improvement)
> > httperf  [n6]   1       1.05   (5% improvement)
> > Trade           1       1.31   (31% improvement) 
> 
> That smells like there's more to the story, a 100% improvement is too
> much..

Yeah I have rubbed my eyes several times to make sure its true and ran
the same benchmark (sysbench) again now! I can recreate that ~100%
improvement with the patch even now.

To quickly re-cap my environment, I have a 16-cpu machine w/ 5 cgroups.
1 cgroup (8192 shares) hosts sysbench inside 8-vcpu VM while remaining 4
cgroups (1024 shares each) hosts 4 cpu hogs running on bare metal.
Given this overcommittment, select_idle_sibling() should mostly be a 
no-op (i.e it won't find any idle cores and thus defaults to prev_cpu).
Also the only tasks that will (sleep and) wakeup are the VM tasks.

Since the patch potentially affects (improves) scheduling latencies, I measured 
Sum(se.statistics.wait_sum) for the VM tasks over the benchmark run (5
iterations of sysbench).

tip	    : 987240 ms
tip + patch : 280275 ms 

I will get more comprehensive perf data shortly and post. 

>From what I can tell, the huge improvement in benchmark score is coming from 
reduced latencies for its VM tasks. 

The hard part to figure out (when we are inside select_task_rq_fair()) is 
whether any potential improvement in latencies (because of waking up on a
less loaded cpu) will offshoot the cost of potentially more L2-cache misses, 
for which IMHO we don't have enough data to make a good decision.

- vatsa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/