linux-kernel - Re: [RFC] Randomness on confidential computing platforms

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZbPOi0760srv0rE0@google.com>
Date: Fri, 26 Jan 2024 07:23:55 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Cc: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>, x86@...nel.org, 
	"Theodore Ts'o" <tytso@....edu>, "Jason A. Donenfeld" <Jason@...c4.com>, 
	Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>, 
	Elena Reshetova <elena.reshetova@...el.com>, Jun Nakajima <jun.nakajima@...el.com>, 
	Tom Lendacky <thomas.lendacky@....com>, Ashish Kalra <ashish.kalra@....com>, 
	linux-coco@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [RFC] Randomness on confidential computing platforms

On Fri, Jan 26, 2024, Kirill A. Shutemov wrote:
> Problem Statement
> 
> Currently Linux RNG uses the random inputs obtained from x86
> RDRAND/RDSEED instructions (if present) during early initialization
> stage (by mixing the obtained input into the random pool via
> _mix_pool_bytes()), as well as for seeding/reseeding ChaCha-based CRNG.
> When the calls to both RDRAND/RDSEED fail (including RDRAND internal
> retries), the timing-based fallbacks are used in the latter case, and
> during the early boot case this source of entropy input is simply
> skipped. Overall Linux RNG has many other sources of entropy that it
> uses (also depending on what HW is used), but the dominating one is
> interrupts.
> 
> In a Confidential Computing Guest threat model, given the absence of any
> special trusted HW for the secure entropy input, RDRAND/RDSEED
> instructions is the only entropy source that is unobservable outside of
> Confidential Computing Guest TCB. However, with enough pressure on these
> instructions from multiple cores (see Intel SDM, Volume 1, Section
> 7.3.17, “Random Number Generator Instructions”), they can be made to
> fail on purpose and force the Confidential Computing Guest Linux RNG to
> use only Host/VMM controlled entropy sources.
> 
> Solution options
> 
> There are several possible solutions to this problem and the intention
> of this RFC is to initiate a joined discussion. Here are some options
> that has been considered:
> 
> 1. Do nothing and accept the risk.
> 2. Force endless looping on RDRAND/RDSEED instructions when run in a
>    Confidential Computing Guest (this patch). This option turns the
>    attack against the quality of cryptographic randomness provided by
>    Confidential Computing Guest’s Linux RNG into a DoS attack against
>    the Confidential Computing Guest itself (DoS attack is out of scope
>    for the Confidential Computing threat model).
> 3. Panic after enough re-tries of RDRAND/RDSEED instructions fail.
>    Another DoS variant against the Guest.
> 4. Exit to the host/VMM with an error indication after a Confidential
>    Computing Guest failed to obtain random input from RDRAND/RDSEED
>    instructions after reasonable number of retries. This option allows
>    host/VMM to take some correction action for cases when the load on
>    RDRAND/RDSEED instructions has been put by another actor, i.e. the
>    other guest VM. The exit to host/VMM in such cases can be made
>    transparent for the Confidential Computing Guest in the TDX case with
>    the assistance of the TDX module component.

Hell no.  Develop better hardware if you want to guarantee forward progress.
Don't push more complexity into the host stack for something that in all likelihood
will never happen outside of buggy software or hardware.

> 5. Anything other better option?

Give the admin the option to choose between "I don't care, carry-on with less
randomness" and "I'm paranoid, panic, panic, panic!".  In other words, let the
admin choose between #1 and #3 at boot time.  You could probably even let the
admin control the number of retries, though that's probably a bit excessive.

And don't tie it to CoCo VMs, e.g. if someone is relying on randomness for a bare
metal workload, they might prefer to panic if hardware is acting funky.