linux-kernel - Re: [PATCH 1/2] x86/random: Retry on RDSEED failure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240201045710.GD2356784@mit.edu>
Date: Wed, 31 Jan 2024 23:57:10 -0500
From: "Theodore Ts'o" <tytso@....edu>
To: "Jason A. Donenfeld" <Jason@...c4.com>
Cc: "Reshetova, Elena" <elena.reshetova@...el.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "H. Peter Anvin" <hpa@...or.com>, "x86@...nel.org" <x86@...nel.org>,
        Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        "Nakajima, Jun" <jun.nakajima@...el.com>,
        Tom Lendacky <thomas.lendacky@....com>,
        "Kalra, Ashish" <ashish.kalra@....com>,
        Sean Christopherson <seanjc@...gle.com>,
        "linux-coco@...ts.linux.dev" <linux-coco@...ts.linux.dev>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] x86/random: Retry on RDSEED failure

On Wed, Jan 31, 2024 at 07:01:01PM +0100, Jason A. Donenfeld wrote:
> So if this is what we're congealing around, I guess we can:
> 
> 0) Leave RDSEED alone and focus on RDRAND.
> 1) Add `WARN_ON_ONCE(in_early_boot);` to the failure path of RDRAND
> (and simply hope this doesn't get exploited for guest-guest boot DoS).
> 2) Loop forever in RDRAND on CoCo VMs, post-boot, with the comments
> and variable naming making it clear that this is a hardware bug
> workaround, not a "feature" added for "extra security".
> 3) Complain loudly to Intel and get them to fix the hardware.
> 
> Though, a large part of me would really like to skip that step (2),
> first because it's a pretty gross bandaid that adds lots of
> complexity, and second because it'll make (3) less poignant

If we need to loop more than, say, 10 seconds in a CoCo VM, I'd just
panic with a repeated RDRAND failure message.  This makes the point of
(3) that much pointed, and it's better than having a CoCo VM
mysteriously hang in the face of a DOS attack.

I'll note that it should be relatively easy for Intel to make sure
that if there is an undue draw on RDRAND, to at that point enforce
"fair share" mode where each of the N cores get at most 1/N of the
available entropy.  So if you have single core CoCo VM on a 256 core
machine trying to boot, and the evil attacker has purchased 255 cores
worth of VM's, all of which are busy-looping on RDRAND, while the CoCo
VM is booting, if it is looping on RDRAND, it should be getting
1/256th of the availabe RDRAND output, and since it is only trying to
grab enough randomness to seed the /dev/random CRNG, if it can't get
enough randomness in 10 seconds --- well, Intel's customers should be
finding another vendor's CPU that can do a better job.

			     	      	 - Ted