lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230315152552.GF2006103@hirez.programming.kicks-ass.net>
Date:   Wed, 15 Mar 2023 16:25:52 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Chen Yu <yu.c.chen@...el.com>
Cc:     Vincent Guittot <vincent.guittot@...aro.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Tim Chen <tim.c.chen@...el.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        Abel Wu <wuyun.abel@...edance.com>,
        Yicong Yang <yangyicong@...ilicon.com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>,
        Honglei Wang <wanghonglei@...ichuxing.com>,
        Len Brown <len.brown@...el.com>,
        Chen Yu <yu.chen.surf@...il.com>,
        Tianchen Ding <dtcccc@...ux.alibaba.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        Josh Don <joshdon@...gle.com>, Hillf Danton <hdanton@...a.com>,
        kernel test robot <yujie.liu@...el.com>,
        Arjan Van De Ven <arjan.van.de.ven@...el.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 2/2] sched/fair: Introduce SIS_SHORT to wake up short
 task on current CPU

On Wed, Feb 22, 2023 at 10:09:55PM +0800, Chen Yu wrote:

> will-it-scale
> =============
> case			load		baseline	compare%
> context_switch1		224 groups	1.00		+946.68%
> 
> There is a huge improvement in fast context switch test case, especially
> when the number of groups equals the CPUs.
> 
> netperf
> =======
> case            	load    	baseline(std%)	compare%( std%)
> TCP_RR          	56-threads	 1.00 (  1.12)	 -0.05 (  0.97)
> TCP_RR          	112-threads	 1.00 (  0.50)	 +0.31 (  0.35)
> TCP_RR          	168-threads	 1.00 (  3.46)	 +5.50 (  2.08)
> TCP_RR          	224-threads	 1.00 (  2.52)	+665.38 (  3.38)
> TCP_RR          	280-threads	 1.00 ( 38.59)	+22.12 ( 11.36)
> TCP_RR          	336-threads	 1.00 ( 15.88)	 -0.00 ( 19.96)
> TCP_RR          	392-threads	 1.00 ( 27.22)	 +0.26 ( 24.26)
> TCP_RR          	448-threads	 1.00 ( 37.88)	 +0.04 ( 27.87)
> UDP_RR          	56-threads	 1.00 (  2.39)	 -0.36 (  8.33)
> UDP_RR          	112-threads	 1.00 ( 22.62)	 -0.65 ( 24.66)
> UDP_RR          	168-threads	 1.00 ( 15.72)	 +3.97 (  5.02)
> UDP_RR          	224-threads	 1.00 ( 15.90)	+134.98 ( 28.59)
> UDP_RR          	280-threads	 1.00 ( 32.43)	 +0.26 ( 29.68)
> UDP_RR          	336-threads	 1.00 ( 39.21)	 -0.05 ( 39.71)
> UDP_RR          	392-threads	 1.00 ( 31.76)	 -0.22 ( 32.00)
> UDP_RR          	448-threads	 1.00 ( 44.90)	 +0.06 ( 31.83)
> 
> There is significant 600+% improvement for TCP_RR and 100+% for UDP_RR
> when the number of threads equals the CPUs.
> 
> tbench
> ======
> case            	load    	baseline(std%)	compare%( std%)
> loopback        	56-threads	 1.00 (  0.15)	 +0.88 (  0.08)
> loopback        	112-threads	 1.00 (  0.06)	 -0.41 (  0.52)
> loopback        	168-threads	 1.00 (  0.17)	+45.42 ( 39.54)
> loopback        	224-threads	 1.00 ( 36.93)	+24.10 (  0.06)
> loopback        	280-threads	 1.00 (  0.04)	 -0.04 (  0.04)
> loopback        	336-threads	 1.00 (  0.06)	 -0.16 (  0.14)
> loopback        	392-threads	 1.00 (  0.05)	 +0.06 (  0.02)
> loopback        	448-threads	 1.00 (  0.07)	 -0.02 (  0.07)
> 
> There is no noticeable impact on tbench. Although there is run-to-run variance
> in 168/224 threads case, with or without this patch applied.

So there is a very narrow, but significant, win at 4x overload.
What about 3x/5x overload, they only have very marginal gains.

So these patches are briliant if you run at exactly 4x overload, and
very meh otherwise.

Why do we care about 4x overload?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ