[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210125090419.GW3592@techsingularity.net>
Date: Mon, 25 Jan 2021 09:04:19 +0000
From: Mel Gorman <mgorman@...hsingularity.net>
To: "Li, Aubrey" <aubrey.li@...ux.intel.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Qais Yousef <qais.yousef@....com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 0/5] Scan for an idle sibling in a single pass
On Mon, Jan 25, 2021 at 12:29:47PM +0800, Li, Aubrey wrote:
> >>> hackbench -l 2560 -g 1 on 8 cores arm64
> >>> v5.11-rc4 : 1.355 (+/- 7.96)
> >>> + sis improvement : 1.923 (+/- 25%)
> >>> + the patch below : 1.332 (+/- 4.95)
> >>>
> >>> hackbench -l 2560 -g 256 on 8 cores arm64
> >>> v5.11-rc4 : 2.116 (+/- 4.62%)
> >>> + sis improvement : 2.216 (+/- 3.84%)
> >>> + the patch below : 2.113 (+/- 3.01%)
> >>>
>
> 4 benchmarks reported out during weekend, with patch 3 on a x86 4s system
> with 24 cores per socket and 2 HT per core, total 192 CPUs.
>
> It looks like mid-load has notable changes on my side:
> - netperf 50% num of threads in TCP mode has 27.25% improved
> - tbench 50% num of threads has 9.52% regression
>
It's interesting that patch 3 would make any difference on x64 given that
it's SMT2. The scan depth should have been similar. It's somewhat expected
that it will not be a universal win, particularly once the utilisation
is high enough to spill over in sched domains (25%, 50%, 75% utilisation
being interesting on 4-socket systems). In such cases, double scanning can
still show improvements for workloads that idle rapidly like tbench and
hackbench even though it's expensive. The extra scanning gives more time
for a CPU to go idle enough to be selected which can improve throughput
but at the cost of wake-up latency,
Hopefully v4 can be tested as well which is now just a single scan.
--
Mel Gorman
SUSE Labs
Powered by blists - more mailing lists