[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZQB7DWSuUmzql8/D@chenyu5-mobl2.ccr.corp.intel.com>
Date: Tue, 12 Sep 2023 22:51:57 +0800
From: Chen Yu <yu.c.chen@...el.com>
To: Mike Galbraith <efault@....de>
CC: K Prateek Nayak <kprateek.nayak@....com>,
Tim Chen <tim.c.chen@...el.com>, Aaron Lu <aaron.lu@...el.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>,
"Mel Gorman" <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
"Gautham R . Shenoy" <gautham.shenoy@....com>,
<linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Ingo Molnar <mingo@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Juri Lelli <juri.lelli@...hat.com>
Subject: Re: [RFC PATCH 2/2] sched/fair: skip the cache hot CPU in
select_idle_cpu()
Hi Mike,
thanks for taking a look,
On 2023-09-12 at 11:39:55 +0200, Mike Galbraith wrote:
> On Mon, 2023-09-11 at 18:19 +0800, Chen Yu wrote:
> >
> > > Speaking of cache-hot idle CPU, is netperf actually more happy with
> > > piling on current CPU?
> >
> > Yes. Per my previous test, netperf of TCP_RR/UDP_RR really likes to
> > put the waker and wakee together.
>
> Hm, seems there's at least one shared L2 case where that's untrue by
> more than a tiny margin, which surprised me rather a lot.
>
Yes, the task stacking is in theory against the work conservation of the
scheduler, and it depends on how much the resource(l1/l2 cache, dsb) locallity
is, and it is workload and hardware specific.
> For grins, I tested netperf on my dinky rpi4b, and while its RR numbers
> seem kinda odd, they're also seemingly repeatable (ergo showing them).
> I measured a very modest cross-core win on a shared L2 Intel CPU some
> years ago (when Q6600 was shiny/new) but nothing close to these deltas.
>
This is interesting, I have a Jacobsville which also has shared L2, I'll
run some tests to check what the difference between task stacking vs spreading task
on that platform. But I guess that is another topic because current patch
avoids stacking tasks.
thanks,
Chenyu
> Makes me wonder what (a tad beefier) Bulldog RR numbers look like.
>
> root@...4:~# ONLY=TCP_RR netperf.sh
> TCP_RR-1 unbound Avg: 29611 Sum: 29611
> TCP_RR-1 stacked Avg: 22540 Sum: 22540
> TCP_RR-1 cross-core Avg: 30181 Sum: 30181
>
> root@...4:~# netperf.sh
> TCP_SENDFILE-1 unbound Avg: 15572 Sum: 15572
> TCP_SENDFILE-1 stacked Avg: 11533 Sum: 11533
> TCP_SENDFILE-1 cross-core Avg: 15751 Sum: 15751
>
> TCP_STREAM-1 unbound Avg: 6331 Sum: 6331
> TCP_STREAM-1 stacked Avg: 6031 Sum: 6031
> TCP_STREAM-1 cross-core Avg: 6211 Sum: 6211
>
> TCP_MAERTS-1 unbound Avg: 6306 Sum: 6306
> TCP_MAERTS-1 stacked Avg: 6094 Sum: 6094
> TCP_MAERTS-1 cross-core Avg: 9393 Sum: 9393
>
> UDP_STREAM-1 unbound Avg: 22277 Sum: 22277
> UDP_STREAM-1 stacked Avg: 18844 Sum: 18844
> UDP_STREAM-1 cross-core Avg: 24749 Sum: 24749
>
> TCP_RR-1 unbound Avg: 29674 Sum: 29674
> TCP_RR-1 stacked Avg: 22267 Sum: 22267
> TCP_RR-1 cross-core Avg: 30237 Sum: 30237
>
> UDP_RR-1 unbound Avg: 36189 Sum: 36189
> UDP_RR-1 stacked Avg: 27129 Sum: 27129
> UDP_RR-1 cross-core Avg: 37033 Sum: 37033
Powered by blists - more mailing lists