linux-kernel - Re: [RFC] Randomness on confidential computing platforms

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ajyeah2brs4mfogkxmp7wdteoesevpsvpwt7pxz7k5ifo76ihk@imfaevj2s4ms>
Date: Mon, 29 Jan 2024 12:27:07 +0200
From: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, 
	Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>, 
	"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org, Theodore Ts'o <tytso@....edu>, 
	"Jason A. Donenfeld" <Jason@...c4.com>, 
	Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>, Elena Reshetova <elena.reshetova@...el.com>, 
	Jun Nakajima <jun.nakajima@...el.com>, Tom Lendacky <thomas.lendacky@....com>, 
	Ashish Kalra <ashish.kalra@....com>, linux-coco@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [RFC] Randomness on confidential computing platforms

On Fri, Jan 26, 2024 at 07:23:55AM -0800, Sean Christopherson wrote:
> On Fri, Jan 26, 2024, Kirill A. Shutemov wrote:
> > Problem Statement
> > 
> > Currently Linux RNG uses the random inputs obtained from x86
> > RDRAND/RDSEED instructions (if present) during early initialization
> > stage (by mixing the obtained input into the random pool via
> > _mix_pool_bytes()), as well as for seeding/reseeding ChaCha-based CRNG.
> > When the calls to both RDRAND/RDSEED fail (including RDRAND internal
> > retries), the timing-based fallbacks are used in the latter case, and
> > during the early boot case this source of entropy input is simply
> > skipped. Overall Linux RNG has many other sources of entropy that it
> > uses (also depending on what HW is used), but the dominating one is
> > interrupts.
> > 
> > In a Confidential Computing Guest threat model, given the absence of any
> > special trusted HW for the secure entropy input, RDRAND/RDSEED
> > instructions is the only entropy source that is unobservable outside of
> > Confidential Computing Guest TCB. However, with enough pressure on these
> > instructions from multiple cores (see Intel SDM, Volume 1, Section
> > 7.3.17, “Random Number Generator Instructions”), they can be made to
> > fail on purpose and force the Confidential Computing Guest Linux RNG to
> > use only Host/VMM controlled entropy sources.
> > 
> > Solution options
> > 
> > There are several possible solutions to this problem and the intention
> > of this RFC is to initiate a joined discussion. Here are some options
> > that has been considered:
> > 
> > 1. Do nothing and accept the risk.
> > 2. Force endless looping on RDRAND/RDSEED instructions when run in a
> >    Confidential Computing Guest (this patch). This option turns the
> >    attack against the quality of cryptographic randomness provided by
> >    Confidential Computing Guest’s Linux RNG into a DoS attack against
> >    the Confidential Computing Guest itself (DoS attack is out of scope
> >    for the Confidential Computing threat model).
> > 3. Panic after enough re-tries of RDRAND/RDSEED instructions fail.
> >    Another DoS variant against the Guest.
> > 4. Exit to the host/VMM with an error indication after a Confidential
> >    Computing Guest failed to obtain random input from RDRAND/RDSEED
> >    instructions after reasonable number of retries. This option allows
> >    host/VMM to take some correction action for cases when the load on
> >    RDRAND/RDSEED instructions has been put by another actor, i.e. the
> >    other guest VM. The exit to host/VMM in such cases can be made
> >    transparent for the Confidential Computing Guest in the TDX case with
> >    the assistance of the TDX module component.
> 
> Hell no.  Develop better hardware if you want to guarantee forward progress.
> Don't push more complexity into the host stack for something that in all likelihood
> will never happen outside of buggy software or hardware.

My idea for this option was to make TDH.VP.ENTER return TDX_RND_NO_ENTROPY
in such case. VMM can simply retry or maybe schedule other workload and
let entropy pool to recover.

I don't think making RDRAND/RDSEED never-fail on HW level is feasible. And
it is definitely not guaranteed by current architecture.

> > 5. Anything other better option?
> 
> Give the admin the option to choose between "I don't care, carry-on with less
> randomness" and "I'm paranoid, panic, panic, panic!".  In other words, let the
> admin choose between #1 and #3 at boot time.  You could probably even let the
> admin control the number of retries, though that's probably a bit excessive.
> 
> And don't tie it to CoCo VMs, e.g. if someone is relying on randomness for a bare
> metal workload, they might prefer to panic if hardware is acting funky.

If we go this path, I still the option has to have strict default for
CoCo VMs as they don't have options to fallback to.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov