[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230315152552.GF2006103@hirez.programming.kicks-ass.net>
Date: Wed, 15 Mar 2023 16:25:52 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Chen Yu <yu.c.chen@...el.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Mel Gorman <mgorman@...hsingularity.net>,
Tim Chen <tim.c.chen@...el.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>,
K Prateek Nayak <kprateek.nayak@....com>,
Abel Wu <wuyun.abel@...edance.com>,
Yicong Yang <yangyicong@...ilicon.com>,
"Gautham R . Shenoy" <gautham.shenoy@....com>,
Honglei Wang <wanghonglei@...ichuxing.com>,
Len Brown <len.brown@...el.com>,
Chen Yu <yu.chen.surf@...il.com>,
Tianchen Ding <dtcccc@...ux.alibaba.com>,
Joel Fernandes <joel@...lfernandes.org>,
Josh Don <joshdon@...gle.com>, Hillf Danton <hdanton@...a.com>,
kernel test robot <yujie.liu@...el.com>,
Arjan Van De Ven <arjan.van.de.ven@...el.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 2/2] sched/fair: Introduce SIS_SHORT to wake up short
task on current CPU
On Wed, Feb 22, 2023 at 10:09:55PM +0800, Chen Yu wrote:
> will-it-scale
> =============
> case load baseline compare%
> context_switch1 224 groups 1.00 +946.68%
>
> There is a huge improvement in fast context switch test case, especially
> when the number of groups equals the CPUs.
>
> netperf
> =======
> case load baseline(std%) compare%( std%)
> TCP_RR 56-threads 1.00 ( 1.12) -0.05 ( 0.97)
> TCP_RR 112-threads 1.00 ( 0.50) +0.31 ( 0.35)
> TCP_RR 168-threads 1.00 ( 3.46) +5.50 ( 2.08)
> TCP_RR 224-threads 1.00 ( 2.52) +665.38 ( 3.38)
> TCP_RR 280-threads 1.00 ( 38.59) +22.12 ( 11.36)
> TCP_RR 336-threads 1.00 ( 15.88) -0.00 ( 19.96)
> TCP_RR 392-threads 1.00 ( 27.22) +0.26 ( 24.26)
> TCP_RR 448-threads 1.00 ( 37.88) +0.04 ( 27.87)
> UDP_RR 56-threads 1.00 ( 2.39) -0.36 ( 8.33)
> UDP_RR 112-threads 1.00 ( 22.62) -0.65 ( 24.66)
> UDP_RR 168-threads 1.00 ( 15.72) +3.97 ( 5.02)
> UDP_RR 224-threads 1.00 ( 15.90) +134.98 ( 28.59)
> UDP_RR 280-threads 1.00 ( 32.43) +0.26 ( 29.68)
> UDP_RR 336-threads 1.00 ( 39.21) -0.05 ( 39.71)
> UDP_RR 392-threads 1.00 ( 31.76) -0.22 ( 32.00)
> UDP_RR 448-threads 1.00 ( 44.90) +0.06 ( 31.83)
>
> There is significant 600+% improvement for TCP_RR and 100+% for UDP_RR
> when the number of threads equals the CPUs.
>
> tbench
> ======
> case load baseline(std%) compare%( std%)
> loopback 56-threads 1.00 ( 0.15) +0.88 ( 0.08)
> loopback 112-threads 1.00 ( 0.06) -0.41 ( 0.52)
> loopback 168-threads 1.00 ( 0.17) +45.42 ( 39.54)
> loopback 224-threads 1.00 ( 36.93) +24.10 ( 0.06)
> loopback 280-threads 1.00 ( 0.04) -0.04 ( 0.04)
> loopback 336-threads 1.00 ( 0.06) -0.16 ( 0.14)
> loopback 392-threads 1.00 ( 0.05) +0.06 ( 0.02)
> loopback 448-threads 1.00 ( 0.07) -0.02 ( 0.07)
>
> There is no noticeable impact on tbench. Although there is run-to-run variance
> in 168/224 threads case, with or without this patch applied.
So there is a very narrow, but significant, win at 4x overload.
What about 3x/5x overload, they only have very marginal gains.
So these patches are briliant if you run at exactly 4x overload, and
very meh otherwise.
Why do we care about 4x overload?
Powered by blists - more mailing lists