lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 14 Oct 2020 12:23:57 -0700
From:   Ankur Arora <ankur.a.arora@...cle.com>
To:     Ingo Molnar <mingo@...nel.org>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        kirill@...temov.name, mhocko@...nel.org,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
        Tony Luck <tony.luck@...el.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Xiaoyao Li <xiaoyao.li@...el.com>,
        Fenghua Yu <fenghua.yu@...el.com>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH 7/8] x86/cpu/intel: enable X86_FEATURE_NT_GOOD on Intel
 Broadwellx

On 2020-10-14 8:31 a.m., Ingo Molnar wrote:
> 
> * Ankur Arora <ankur.a.arora@...cle.com> wrote:
> 
>> System:           Oracle X6-2
>> CPU:              2 nodes * 10 cores/node * 2 threads/core
>> 		  Intel Xeon E5-2630 v4 (Broadwellx, 6:79:1)
>> Memory:           256 GB evenly split between nodes
>> Microcode:        0xb00002e
>> scaling_governor: performance
>> L3 size:          25MB
>> intel_pstate/no_turbo: 1
>>
>> Performance comparison of 'perf bench mem memset -l 1' for x86-64-stosb
>> (X86_FEATURE_ERMS) and x86-64-movnt (X86_FEATURE_NT_GOOD):
>>
>>                x86-64-stosb (5 runs)     x86-64-movnt (5 runs)       speedup
>>                -----------------------   -----------------------     -------
>>       size       BW        (   pstdev)          BW   (   pstdev)
>>
>>       16MB      17.35 GB/s ( +- 9.27%)    11.83 GB/s ( +- 0.19%)     -31.81%
>>      128MB       5.31 GB/s ( +- 0.13%)    11.72 GB/s ( +- 0.44%)    +121.84%
>>     1024MB       5.42 GB/s ( +- 0.13%)    11.78 GB/s ( +- 0.03%)    +117.34%
>>     4096MB       5.41 GB/s ( +- 0.41%)    11.76 GB/s ( +- 0.07%)    +117.37%
> 
>> +	if (c->x86 == 6 && c->x86_model == INTEL_FAM6_BROADWELL_X)
>> +		set_cpu_cap(c, X86_FEATURE_NT_GOOD);
> 
> So while I agree with how you've done careful measurements to isolate bad
> microarchitectures where non-temporal stores are slow, I do think this
> approach of opt-in doesn't scale and is hard to maintain.
> 
> Instead I'd suggest enabling this by default everywhere, and creating a
> X86_FEATURE_NT_BAD quirk table for the bad microarchitectures.
Okay, some kind of quirk table is a great idea. Also means that there's a
single place for keeping this rather than it being scattered all over in
the code.

That also simplifies my handling of features like X86_FEATURE_CLZERO.
I was concerned that if you squint a bit, it seems to be an alias to
X86_FEATURE_NT_GOOD and that seemed ugly.

> 
> This means that with new microarchitectures we'd get automatic enablement,
> and hopefully chip testing would identify cases where performance isn't as
> good.
Makes sense to me. A first class citizen, as it were...

Thanks for reviewing btw.

Ankur

> 
> I.e. the 'trust but verify' method.


> 
> Thanks,
> 
> 	Ingo
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ