lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210125090419.GW3592@techsingularity.net>
Date:   Mon, 25 Jan 2021 09:04:19 +0000
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     "Li, Aubrey" <aubrey.li@...ux.intel.com>
Cc:     Vincent Guittot <vincent.guittot@...aro.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Qais Yousef <qais.yousef@....com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 0/5] Scan for an idle sibling in a single pass

On Mon, Jan 25, 2021 at 12:29:47PM +0800, Li, Aubrey wrote:
> >>> hackbench -l 2560 -g 1 on 8 cores arm64
> >>> v5.11-rc4 : 1.355 (+/- 7.96)
> >>> + sis improvement : 1.923 (+/- 25%)
> >>> + the patch below : 1.332 (+/- 4.95)
> >>>
> >>> hackbench -l 2560 -g 256 on 8 cores arm64
> >>> v5.11-rc4 : 2.116 (+/- 4.62%)
> >>> + sis improvement : 2.216 (+/- 3.84%)
> >>> + the patch below : 2.113 (+/- 3.01%)
> >>>
> 
> 4 benchmarks reported out during weekend, with patch 3 on a x86 4s system
> with 24 cores per socket and 2 HT per core, total 192 CPUs.
> 
> It looks like mid-load has notable changes on my side:
> - netperf 50% num of threads in TCP mode has 27.25% improved
> - tbench 50% num of threads has 9.52% regression
> 

It's interesting that patch 3 would make any difference on x64 given that
it's SMT2. The scan depth should have been similar. It's somewhat expected
that it will not be a universal win, particularly once the utilisation
is high enough to spill over in sched domains (25%, 50%, 75% utilisation
being interesting on 4-socket systems). In such cases, double scanning can
still show improvements for workloads that idle rapidly like tbench and
hackbench even though it's expensive. The extra scanning gives more time
for a CPU to go idle enough to be selected which can improve throughput
but at the cost of wake-up latency,

Hopefully v4 can be tested as well which is now just a single scan.

-- 
Mel Gorman
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ