lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <94BA6EDC-5C44-490D-B6F5-0E38C8822F7C@zytor.com>
Date: Thu, 16 Oct 2025 12:39:43 -0700
From: "H. Peter Anvin" <hpa@...or.com>
To: Gregory Price <gourry@...rry.net>, x86@...nel.org
CC: linux-kernel@...r.kernel.org, tglx@...utronix.de, mingo@...hat.com,
        bp@...en8.de, dave.hansen@...ux.intel.com, peterz@...radead.org,
        mario.limonciello@....com, riel@...riel.com, yazen.ghannam@....com,
        me@...aill.net, kai.huang@...el.com, sandipan.das@....com,
        darwi@...utronix.de, stable@...nel.org
Subject: Re: [PATCH] x86/amd: Disable RDSEED on AMD Zen5 Turin because of an error.

On October 16, 2025 12:12:31 PM PDT, Gregory Price <gourry@...rry.net> wrote:
>On Thu, Oct 16, 2025 at 02:21:07PM -0400, Gregory Price wrote:
>> Under unknown architectural conditions, Zen5 chips running rdseed
>> can produce (val=0,CF=1) as a "random" result over 10% of the time
>> (when rdseed is successful).  CF=1 indicates success, while val=0
>> is typically only produced when rdseed fails (CF=0).
>> 
>> This suggests there is an architectural issue which causes rdseed
>> to misclassify a failure as a success under unknown conditions.
>> 
>> This was reproduced reliably by launching 2-threads per available
>> core, 1-thread per for hamming on RDSEED, and 1-thread per core
>> collectively eating and hammering on ~90% of memory.
>> 
>> Fix was modeled after a different RDSEED issue in Zen2 Cyan Skillfish.
>> 
>> Link: https://lore.kernel.org/all/20250715130819.461718765@linuxfoundation.org/
>> Cc: <stable@...nel.org>
>> Signed-off-by: Gregory Price <gourry@...rry.net>
>> ---
>>  arch/x86/kernel/cpu/amd.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>> 
>> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
>> index 5398db4dedb4..9c3b2f010f8c 100644
>> --- a/arch/x86/kernel/cpu/amd.c
>> +++ b/arch/x86/kernel/cpu/amd.c
>> @@ -1037,6 +1037,12 @@ static void init_amd_zen4(struct cpuinfo_x86 *c)
>>  
>>  static void init_amd_zen5(struct cpuinfo_x86 *c)
>>  {
>> +	/* Disable RDSEED on AMD Turin because of an error. */
>> +	if (c->x86_model == 0x11 && c->x86_stepping == 0x0) {
>
>After re-examining the results, this was also observed on
>
>  c->x86_model == 0x2
>
>Maybe this should just be disabled for all of Zen5?
>I will wait for comment.
>
>In a stress test (link) I found that my Zen5 chips have a bizarrely
>low rdseed success rate anyway - so it doesn't even seem useful.
>
>./rdseed_stress_single_core
>RDRAND: 100.00%, RDSEED: 3.48%
>
>./rdseed_stress_multi_thread
>RDRAND: 99.99%, RDSEED: 0.33%
>
>Link: https://lore.kernel.org/all/Zbjw5hRHr_E6k18r@zx2c4.com/
>
>---
>
>diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
>index 9c3b2f010f8c..54f07514674a 100644
>--- a/arch/x86/kernel/cpu/amd.c
>+++ b/arch/x86/kernel/cpu/amd.c
>@@ -1038,7 +1038,8 @@ static void init_amd_zen4(struct cpuinfo_x86 *c)
> static void init_amd_zen5(struct cpuinfo_x86 *c)
> {
>        /* Disable RDSEED on AMD Turin because of an error. */
>-       if (c->x86_model == 0x11 && c->x86_stepping == 0x0) {
>+       if ((c->x86_model == 0x11 || c->x86_model == 0x2) &&
>+           (c->x86_stepping == 0x0)) {
>                clear_cpu_cap(c, X86_FEATURE_RDSEED);
>                msr_clear_bit(MSR_AMD64_CPUID_FN_7, 18);
>                pr_emerg("RDSEED is not reliable on this platform; disabling.\n");

Let's be blunt (and this applies regardless of vendor, the same thing would apply to Intel chips): RDSEED *must not* be allowed to silently fail. Period. The whole *point* of RDSEED is that it is to unconditionally deliver a fully entropic random result. 

This affects user space applications, not just the kernel 

As such, it is absolutely necessary to be conservative here. The vendor (in this case AMD) can then root-cause the failure and provide a narrower blacklist when the true extent is known.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ