linux-kernel - Re: [PATCH v6 2/2] sched/fair: Introduce SIS_SHORT to wake up short task on current CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230315152552.GF2006103@hirez.programming.kicks-ass.net>
Date:   Wed, 15 Mar 2023 16:25:52 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Chen Yu <yu.c.chen@...el.com>
Cc:     Vincent Guittot <vincent.guittot@...aro.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Tim Chen <tim.c.chen@...el.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        Abel Wu <wuyun.abel@...edance.com>,
        Yicong Yang <yangyicong@...ilicon.com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>,
        Honglei Wang <wanghonglei@...ichuxing.com>,
        Len Brown <len.brown@...el.com>,
        Chen Yu <yu.chen.surf@...il.com>,
        Tianchen Ding <dtcccc@...ux.alibaba.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        Josh Don <joshdon@...gle.com>, Hillf Danton <hdanton@...a.com>,
        kernel test robot <yujie.liu@...el.com>,
        Arjan Van De Ven <arjan.van.de.ven@...el.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 2/2] sched/fair: Introduce SIS_SHORT to wake up short
 task on current CPU

On Wed, Feb 22, 2023 at 10:09:55PM +0800, Chen Yu wrote:

> will-it-scale
> =============
> case			load		baseline	compare%
> context_switch1		224 groups	1.00		+946.68%
> 
> There is a huge improvement in fast context switch test case, especially
> when the number of groups equals the CPUs.
> 
> netperf
> =======
> case            	load    	baseline(std%)	compare%( std%)
> TCP_RR          	56-threads	 1.00 (  1.12)	 -0.05 (  0.97)
> TCP_RR          	112-threads	 1.00 (  0.50)	 +0.31 (  0.35)
> TCP_RR          	168-threads	 1.00 (  3.46)	 +5.50 (  2.08)
> TCP_RR          	224-threads	 1.00 (  2.52)	+665.38 (  3.38)
> TCP_RR          	280-threads	 1.00 ( 38.59)	+22.12 ( 11.36)
> TCP_RR          	336-threads	 1.00 ( 15.88)	 -0.00 ( 19.96)
> TCP_RR          	392-threads	 1.00 ( 27.22)	 +0.26 ( 24.26)
> TCP_RR          	448-threads	 1.00 ( 37.88)	 +0.04 ( 27.87)
> UDP_RR          	56-threads	 1.00 (  2.39)	 -0.36 (  8.33)
> UDP_RR          	112-threads	 1.00 ( 22.62)	 -0.65 ( 24.66)
> UDP_RR          	168-threads	 1.00 ( 15.72)	 +3.97 (  5.02)
> UDP_RR          	224-threads	 1.00 ( 15.90)	+134.98 ( 28.59)
> UDP_RR          	280-threads	 1.00 ( 32.43)	 +0.26 ( 29.68)
> UDP_RR          	336-threads	 1.00 ( 39.21)	 -0.05 ( 39.71)
> UDP_RR          	392-threads	 1.00 ( 31.76)	 -0.22 ( 32.00)
> UDP_RR          	448-threads	 1.00 ( 44.90)	 +0.06 ( 31.83)
> 
> There is significant 600+% improvement for TCP_RR and 100+% for UDP_RR
> when the number of threads equals the CPUs.
> 
> tbench
> ======
> case            	load    	baseline(std%)	compare%( std%)
> loopback        	56-threads	 1.00 (  0.15)	 +0.88 (  0.08)
> loopback        	112-threads	 1.00 (  0.06)	 -0.41 (  0.52)
> loopback        	168-threads	 1.00 (  0.17)	+45.42 ( 39.54)
> loopback        	224-threads	 1.00 ( 36.93)	+24.10 (  0.06)
> loopback        	280-threads	 1.00 (  0.04)	 -0.04 (  0.04)
> loopback        	336-threads	 1.00 (  0.06)	 -0.16 (  0.14)
> loopback        	392-threads	 1.00 (  0.05)	 +0.06 (  0.02)
> loopback        	448-threads	 1.00 (  0.07)	 -0.02 (  0.07)
> 
> There is no noticeable impact on tbench. Although there is run-to-run variance
> in 168/224 threads case, with or without this patch applied.

So there is a very narrow, but significant, win at 4x overload.
What about 3x/5x overload, they only have very marginal gains.

So these patches are briliant if you run at exactly 4x overload, and
very meh otherwise.

Why do we care about 4x overload?