linux-kernel - Re: [PATCH 7/8] x86/cpu/intel: enable X86_FEATURE_NT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201014153127.GB1424414@gmail.com>
Date:   Wed, 14 Oct 2020 17:31:27 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     Ankur Arora <ankur.a.arora@...cle.com>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        kirill@...temov.name, mhocko@...nel.org,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
        Tony Luck <tony.luck@...el.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Xiaoyao Li <xiaoyao.li@...el.com>,
        Fenghua Yu <fenghua.yu@...el.com>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH 7/8] x86/cpu/intel: enable X86_FEATURE_NT_GOOD on Intel
 Broadwellx


* Ankur Arora <ankur.a.arora@...cle.com> wrote:

> System:           Oracle X6-2
> CPU:              2 nodes * 10 cores/node * 2 threads/core
> 		  Intel Xeon E5-2630 v4 (Broadwellx, 6:79:1)
> Memory:           256 GB evenly split between nodes
> Microcode:        0xb00002e
> scaling_governor: performance
> L3 size:          25MB
> intel_pstate/no_turbo: 1
> 
> Performance comparison of 'perf bench mem memset -l 1' for x86-64-stosb
> (X86_FEATURE_ERMS) and x86-64-movnt (X86_FEATURE_NT_GOOD):
> 
>               x86-64-stosb (5 runs)     x86-64-movnt (5 runs)       speedup
>               -----------------------   -----------------------     -------
>      size       BW        (   pstdev)          BW   (   pstdev)
> 
>      16MB      17.35 GB/s ( +- 9.27%)    11.83 GB/s ( +- 0.19%)     -31.81%
>     128MB       5.31 GB/s ( +- 0.13%)    11.72 GB/s ( +- 0.44%)    +121.84%
>    1024MB       5.42 GB/s ( +- 0.13%)    11.78 GB/s ( +- 0.03%)    +117.34%
>    4096MB       5.41 GB/s ( +- 0.41%)    11.76 GB/s ( +- 0.07%)    +117.37%

> +	if (c->x86 == 6 && c->x86_model == INTEL_FAM6_BROADWELL_X)
> +		set_cpu_cap(c, X86_FEATURE_NT_GOOD);

So while I agree with how you've done careful measurements to isolate bad 
microarchitectures where non-temporal stores are slow, I do think this 
approach of opt-in doesn't scale and is hard to maintain.

Instead I'd suggest enabling this by default everywhere, and creating a 
X86_FEATURE_NT_BAD quirk table for the bad microarchitectures.

This means that with new microarchitectures we'd get automatic enablement, 
and hopefully chip testing would identify cases where performance isn't as 
good.

I.e. the 'trust but verify' method.

Thanks,

	Ingo