[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d05abb37e19055d249b720c1ac448734fc6ea84f.camel@gmx.de>
Date: Tue, 02 Jul 2024 07:08:35 +0200
From: Mike Galbraith <efault@....de>
To: Chen Yu <yu.c.chen@...el.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot
<vincent.guittot@...aro.org>, Tim Chen <tim.c.chen@...el.com>, Yujie Liu
<yujie.liu@...el.com>, K Prateek Nayak <kprateek.nayak@....com>, "Gautham R
. Shenoy" <gautham.shenoy@....com>, Chen Yu <yu.chen.surf@...il.com>,
linux-kernel@...r.kernel.org, Raghavendra K T <raghavendra.kt@....com>
Subject: Re: [PATCH 1/2] sched/fair: Record the average duration of a task
On Mon, 2024-07-01 at 22:57 +0800, Chen Yu wrote:
> > Just take a look at the high speed ping-pong thing (not a benchmark,
> > that's a box full of tape measures, rather silly, but..). TCP_RR IS
> > 1:1, has as short a duration as network stack plus scheduler can
> > possibly make it, and is nearly synchronous to boot, two halves of a
> > whole, the ONLY thing you can certainly safely stack..
>
> I agree, this is a limited scenario.
>
> > but a shared L2 box still takes a wee hit when you do so.
>
> According to a test conducted last month on a system with 500+ CPUs where 4 CPUs
> share the same L2 cache, around 20% improvement was noticed (though not as much
> as on the non-L2 shared platform).
This dinky box doesn't have 500 cores, but it's.. aw, adorable :)
rpi4:/root # ONLY=TCP_RR netperf.sh
TCP_RR-1 unbound Avg: 31754 Sum: 31754
TCP_RR-1 stacked Avg: 26625 Sum: 26625
TCP_RR-1 cross-core Avg: 32325 Sum: 32325
rpi4:/root # tbench.sh 1 30 2>&1|grep Throughput
Throughput 139.024 MB/sec 1 clients 1 procs max_latency=1.116 ms
rpi4:/root # taskset -c 3 tbench.sh 1 30 2>&1|grep Throughput
Throughput 116.765 MB/sec 1 clients 1 procs max_latency=0.340 ms
rpi4:/root #
This little box running its stock 6.6.33 distro kernel pulls out a
cross-core win for both maximally synchronous TCP_RR and the a bit
lesser so but still pretty close tbench. The numbers mean little
though, one propagation speed is lovely, but were there more, I'd be as
stuck with them as I am with rpi4's one-speed (all ahead slow) gearbox.
-Mike
Powered by blists - more mailing lists