lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sat, 25 Feb 2012 12:24:03 +0530 From: Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com> To: Mike Galbraith <efault@....de> Cc: Peter Zijlstra <peterz@...radead.org>, Suresh Siddha <suresh.b.siddha@...el.com>, linux-kernel <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>, Paul Turner <pjt@...gle.com> Subject: Re: sched: Avoid SMT siblings in select_idle_sibling() if possible * Mike Galbraith <efault@....de> [2012-02-23 12:21:04]: > Unpinned netperf TCP_RR and/or tbench pairs? Anything that's wakeup > heavy should tell the tail. Here are some tbench numbers: Machine : 2 Intel Xeon X5650 (Westmere) CPUs (6 core/package) Kernel : tip (HEAD at ebe97fa) dbench : v4.0 One tbench server/client pair was run on same host 5 times (with fs-cache being purged each time) and avg of 5 run for various cases noted below: Case A : HT enabled (24 logical CPUs) Thr'put : 168.166 MB/s (SD_SHARE_PKG_RESOURCES + !SD_BALANCE_WAKE) Thr'put : 169.564 MB/s (SD_SHARE_PKG_RESOURCES + SD_BALANCE_WAKE at mc/smt) Thr'put : 173.151 MB/s (!SD_SHARE_PKG_RESOURCES + !SD_BALANCE_WAKE) Case B : HT disabled (12 logical CPUs) Thr'put : 167.977 MB/s (SD_SHARE_PKG_RESOURCES + !SD_BALANCE_WAKE) Thr'put : 167.891 MB/s (SD_SHARE_PKG_RESOURCES + SD_BALANCE_WAKE at mc) Thr'put : 173.801 MB/s (!SD_SHARE_PKG_RESOURCES + !SD_BALANCE_WAKE) Observations: a. ~3% improvement seen with SD_SHARE_PKG_RESOURCES disabled, which I guess reflects the cost of waking to a cold L2 cache. b. No degradation seen with SD_BALANCE_WAKE enabled at mc/smt domains IMO we need to detect tbench type paired wakeups as synchronous case, in which case blindly wakeup the task to cur_cpu (as cost of L2 cache miss could outweight the cost of any reduced scheduling latencies). IOW select_task_rq_fair() needs to be given better hint as to whether L2 cache has been made warm by someone (interrupt handler or a producer task), in which case (consumer) task needs to be woken in the same L2 cache domain (i.e on cur_cpu itself)? - vatsa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists