linux-kernel - Re: [PATCH v3 0/5] Scan for an idle sibling in a single pass

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <31300317-89e0-ca5e-d095-920c6cfe8704@linux.intel.com>
Date:   Mon, 25 Jan 2021 19:37:55 +0800
From:   "Li, Aubrey" <aubrey.li@...ux.intel.com>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Vincent Guittot <vincent.guittot@...aro.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Qais Yousef <qais.yousef@....com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 0/5] Scan for an idle sibling in a single pass

On 2021/1/25 17:04, Mel Gorman wrote:
> On Mon, Jan 25, 2021 at 12:29:47PM +0800, Li, Aubrey wrote:
>>>>> hackbench -l 2560 -g 1 on 8 cores arm64
>>>>> v5.11-rc4 : 1.355 (+/- 7.96)
>>>>> + sis improvement : 1.923 (+/- 25%)
>>>>> + the patch below : 1.332 (+/- 4.95)
>>>>>
>>>>> hackbench -l 2560 -g 256 on 8 cores arm64
>>>>> v5.11-rc4 : 2.116 (+/- 4.62%)
>>>>> + sis improvement : 2.216 (+/- 3.84%)
>>>>> + the patch below : 2.113 (+/- 3.01%)
>>>>>
>>
>> 4 benchmarks reported out during weekend, with patch 3 on a x86 4s system
>> with 24 cores per socket and 2 HT per core, total 192 CPUs.
>>
>> It looks like mid-load has notable changes on my side:
>> - netperf 50% num of threads in TCP mode has 27.25% improved
>> - tbench 50% num of threads has 9.52% regression
>>
> 
> It's interesting that patch 3 would make any difference on x64 given that
> it's SMT2. The scan depth should have been similar. It's somewhat expected
> that it will not be a universal win, particularly once the utilisation
> is high enough to spill over in sched domains (25%, 50%, 75% utilisation
> being interesting on 4-socket systems). In such cases, double scanning can
> still show improvements for workloads that idle rapidly like tbench and
> hackbench even though it's expensive. The extra scanning gives more time
> for a CPU to go idle enough to be selected which can improve throughput
> but at the cost of wake-up latency,

aha, sorry for the confusion. Since you and Vincent discussed to drop
patch3, I just mentioned I tested 5 patches with patch3, not patch3 alone.

> 
> Hopefully v4 can be tested as well which is now just a single scan.
> 

Sure, may I know the baseline of v4?

Thanks,
-Aubrey