[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230502115408.GC1597538@hirez.programming.kicks-ass.net>
Date: Tue, 2 May 2023 13:54:08 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Chen Yu <yu.c.chen@...el.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Mel Gorman <mgorman@...hsingularity.net>,
Tim Chen <tim.c.chen@...el.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>,
K Prateek Nayak <kprateek.nayak@....com>,
Abel Wu <wuyun.abel@...edance.com>,
Yicong Yang <yangyicong@...ilicon.com>,
"Gautham R . Shenoy" <gautham.shenoy@....com>,
Honglei Wang <wanghonglei@...ichuxing.com>,
Len Brown <len.brown@...el.com>,
Chen Yu <yu.chen.surf@...il.com>,
Tianchen Ding <dtcccc@...ux.alibaba.com>,
Joel Fernandes <joel@...lfernandes.org>,
Josh Don <joshdon@...gle.com>,
kernel test robot <yujie.liu@...el.com>,
Arjan Van De Ven <arjan.van.de.ven@...el.com>,
Aaron Lu <aaron.lu@...el.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v8 2/2] sched/fair: Introduce SIS_CURRENT to wake up
short task on current CPU
On Mon, May 01, 2023 at 11:52:47PM +0800, Chen Yu wrote:
> > So,... I've been poking around with this a bit today and I'm not seeing
> > it. On my ancient IVB-EP (2*10*2) with the code as in
> > queue/sched/core I get:
> >
> > netperf NO_SIS_CURRENT %
> > SIS_CURRENT
> > ----------------------- -------------------------------
> > TCP_SENDFILE-1 : Avg: 42001 40783.4 -2.89898
> > TCP_SENDFILE-10 : Avg: 37065.1 36604.4 -1.24295
> > TCP_SENDFILE-20 : Avg: 21004.4 21356.9 1.67822
> > TCP_SENDFILE-40 : Avg: 7079.93 7231.3 2.13802
> > TCP_SENDFILE-80 : Avg: 3582.98 3615.85 0.917393
> > TCP_STREAM-1 : Avg: 37134.5 35095.4 -5.49112
> > TCP_STREAM-10 : Avg: 31260.7 31588.1 1.04732
> > TCP_STREAM-20 : Avg: 17996.6 17937.4 -0.328951
> > TCP_STREAM-40 : Avg: 7710.4 7790.62 1.04041
> > TCP_STREAM-80 : Avg: 2601.51 2903.89 11.6232
> > TCP_RR-1 : Avg: 81167.8 83541.3 2.92419
> > TCP_RR-10 : Avg: 71123.2 69447.9 -2.35549
> > TCP_RR-20 : Avg: 50905.4 52157.2 2.45907
> > TCP_RR-40 : Avg: 46289.2 46350.7 0.13286
> > TCP_RR-80 : Avg: 22024.4 22229.2 0.929878
> > UDP_RR-1 : Avg: 95997.2 96553.3 0.579288
> > UDP_RR-10 : Avg: 83878.5 78998.6 -5.81782
> > UDP_RR-20 : Avg: 61838.8 62926 1.75812
> > UDP_RR-40 : Avg: 56456.1 57115.2 1.16746
> > UDP_RR-80 : Avg: 27635.2 27784.8 0.541339
> > UDP_STREAM-1 : Avg: 52808.2 51908.6 -1.70352
> > UDP_STREAM-10 : Avg: 43115 43561.2 1.03491
> > UDP_STREAM-20 : Avg: 18798.7 20066 6.74142
> > UDP_STREAM-40 : Avg: 13070.5 13110.2 0.303737
> > UDP_STREAM-80 : Avg: 6248.86 6413.09 2.62816
> > tbench
> > WA_WEIGHT, WA_BIAS, NO_SIS_CURRENT (aka, mainline)
> >
> > Throughput 649.46 MB/sec 2 clients 2 procs max_latency=0.092 ms
> > Throughput 1370.93 MB/sec 5 clients 5 procs max_latency=0.140 ms
> > Throughput 1904.14 MB/sec 10 clients 10 procs max_latency=0.470 ms
> > Throughput 2406.15 MB/sec 20 clients 20 procs max_latency=0.276 ms
> > Throughput 2419.40 MB/sec 40 clients 40 procs max_latency=0.414 ms
> > Throughput 2426.00 MB/sec 80 clients 80 procs max_latency=1.366 ms
> >
> > WA_WEIGHT, WA_BIAS, SIS_CURRENT (aka, with patches on)
> >
> > Throughput 646.55 MB/sec 2 clients 2 procs max_latency=0.104 ms
> > Throughput 1361.06 MB/sec 5 clients 5 procs max_latency=0.100 ms
> > Throughput 1889.82 MB/sec 10 clients 10 procs max_latency=0.154 ms
> > Throughput 2406.57 MB/sec 20 clients 20 procs max_latency=3.667 ms
> > Throughput 2318.00 MB/sec 40 clients 40 procs max_latency=0.390 ms
> > Throughput 2384.85 MB/sec 80 clients 80 procs max_latency=1.371 ms
> >
> >
> > So what's going on here? I don't see anything exciting happening at the
> > 40 mark. At the same time, I can't seem to reproduce Mike's latency pile
> > up either :/
> >
> Thank you very much for trying this patch. This patch was found to mainly
> benefit system with large number of CPUs in 1 LLC. Previously I tested
> it on Sapphire Rapids(2x56C/224T) and Ice Lake Server(2x32C/128T)[1], it
> seems to have benefit on them. The benefit seems to come from:
> 1. reducing the waker stacking among many CPUs within 1 LLC
I should be seeing that at 10 cores per LLC. And when we look at the
tbench results (never the most stable -- let me run a few more of those)
it looks like SIS_CURRENT is actually making that worse.
That latency spike at 20 seems stable for me -- and 3ms is rather small,
I've seen it up to 11ms (but typical in the 4-6 range). This does not
happen with NO_SIS_CURRENT and is a fairly big point against these
patches.
> 2. reducing the C2C overhead within 1 LLC
This is due to how L3 became non-inclusive with Skylake? I can't see
that because I don't have anything that recent :/
> So far I did not received performance difference from LKP on desktop
> test boxes. Let me queue the full test on some desktops to confirm
> if this change has any impact on them.
Right, so I've updated my netperf results above to have a relative
difference between NO_SIS_CURRENT and SIS_CURRENT and I see some losses
at the low end. For servers that gets compensated at the high end, but
desktops tend to not get there much.
Powered by blists - more mailing lists