linux-kernel - Re: [patch 2/2] sched: fix select_idle_sibling() logic in select_task_rq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1267820755.6384.85.camel@marge.simson.net>
Date:	Fri, 05 Mar 2010 21:25:55 +0100
From:	Mike Galbraith <efault@....de>
To:	Suresh Siddha <suresh.b.siddha@...el.com>
Cc:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Ingo Molnar <mingo@...e.hu>,
	Arjan van de Ven <arjan@...ux.jf.intel.com>,
	linux-kernel@...r.kernel.org,
	Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>,
	Yanmin Zhang <yanmin_zhang@...ux.jf.intel.com>,
	Gautham R Shenoy <ego@...ibm.com>
Subject: Re: [patch 2/2] sched: fix select_idle_sibling() logic in
 select_task_rq_fair()

On Fri, 2010-03-05 at 10:39 -0800, Suresh Siddha wrote:
> plain text document attachment (fix_lat_ctx.patch)
> Performance improvements with this patch:
> "lat_ctx -s 0 2" ~22usec (before-this-patch)	~5usec (after-this-patch)

Hm.  On my Q6600 box, it's nowhere near that.

> There are number of things wrong with the select_idle_sibling() logic
> 
> a) Once we select the idle sibling, we use that domain (spanning the cpu that
>    the task is currently woken-up and the idle sibling that we found) in our
>    wake_affine() comparisons. This domain is completely different from the
>    domain(we are supposed to use) that spans the cpu that the task currently
>    woken-up and the cpu where the task previously ran.
> 
> b) We do select_idle_sibling() check only for the cpu that the task is
>    currently woken-up on. If the wake_affine makes the decision of selecting
>    the cpu where the task previously ran, doing a select_idle_sibling() check
>    for that cpu also helps and we don't do this currently.
>  
> c) Also, selelct_idle_sibling() should also treat the current cpu as an idle
>    cpu if it is a sync wakeup and we have only one task running.

I'm going to have to crawl over and test the above, but this bit sounds
like a decidedly un-good thing to do.  Maybe I'm misunderstanding.

Check these lmbench3 numbers, ie the AF UNIX numbers in the last three
runs vs the three above that.  That's what I get with the load running
on one core because I disabled select_idle_sibling() for these runs to
compare cost/benefit of using an idle shared cache core.  The wakeup in
question is a sync wakeup, otherwise, we'd be taking the same beating
TCP is in stock 31.12 and stock 33.  (first 2 sets of triple runs)

Calling the waking cpu idle in that case is a mistake.  Just because the
sync hint was used does not mean there is no gain to be had.  In the
case of this benchmark proggy, that gain is a _lot_, same for the TCP
proggy after I enabled sync hint in smpx tree.  We don't want high
frequency cache misses for sure, but we also don't want to assume
there's nothing to be had by using another core.  There's currently no
way to tell if you can gain by using another core or not, other than to
try it.

If you run tip, you can see a throughput gain even with the pipe test,
because there's a buffer increase patch there, which combined with
owner_spin, produces a gain even with the highly synchronous pipe test,
select_idle_sibling() is only the enabler (hard to spin if on same core
as mutex owner:).


*Local* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
marge     2.6.31.12-smp 0.730 2.845 4.85 6.463  11.3  26.2  14.9  31.
marge     2.6.31.12-smp 0.750 2.864 4.78 6.460  11.2  22.9  14.6  31.
marge     2.6.31.12-smp 0.710 2.835 4.81 6.478  11.5  11.0  14.5  30.
marge        2.6.33-smp 1.320 4.552 5.02 9.169  12.5  26.5  15.4  18.
marge        2.6.33-smp 1.450 4.621 5.45 9.286  12.5  11.4  15.4  18.
marge        2.6.33-smp 1.450 4.589 5.53 9.168  12.6  27.5  15.4  18.
marge       2.6.33-smpx 1.160 3.565 5.97 7.513  11.3 9.776  13.9  18.
marge       2.6.33-smpx 1.140 3.569 6.02 7.479  11.2 9.849  14.0  18.
marge       2.6.33-smpx 1.090 3.563 6.39 7.450  11.2 9.785  14.0  18.
marge       2.6.33-smpx 0.730 2.665 4.85 6.565  11.9  10.3  15.2  31.
marge       2.6.33-smpx 0.740 2.701 4.03 6.573  11.7  10.3  15.4  31.
marge       2.6.33-smpx 0.710 2.753 4.86 6.533  11.7  10.3  15.3  31.


*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                             UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
marge     2.6.31.12-smp 2821 2971 762. 2829.2 4799.0 1243.0 1230.3 4469 1682.
marge     2.6.31.12-smp 2824 2931 760. 2833.3 4736.5 1239.5 1235.8 4462 1678.
marge     2.6.31.12-smp 2796 2936 1139 2843.3 4815.7 1242.8 1234.6 4471 1685.
marge        2.6.33-smp 2670 5151 739. 2816.6 4768.5 1243.7 1237.2 4389 1684.
marge        2.6.33-smp 2627 5126 1135 2806.9 4783.1 1245.1 1236.1 4413 1684.
marge        2.6.33-smp 2582 5037 1137 2799.6 4755.4 1242.0 1239.1 4471 1683.
marge       2.6.33-smpx 2848 5184 2972 2820.5 4804.8 1242.6 1236.9 4315 1686.
marge       2.6.33-smpx 2804 5183 2934 2822.8 4759.3 1245.0 1234.7 4462 1688.
marge       2.6.33-smpx 2729 5177 2920 2837.6 4820.0 1246.9 1238.5 4467 1684.
marge       2.6.33-smpx 2843 2896 1928 2786.5 4751.2 1242.2 1238.6 4493 1682.
marge       2.6.33-smpx 2869 2886 1936 2841.4 4748.9 1244.3 1237.7 4456 1683.
marge       2.6.33-smpx 2845 2895 1947 2836.0 4813.6 1242.7 1236.3 4473 1674.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/