linux-kernel - Re: [PATCH 2/2] x86/random: Issue a warning if RDRAND or RDSEED fails

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <33EBFB22-0063-4E3C-A43D-63638D308325@zytor.com>
Date: Tue, 30 Jan 2024 10:40:24 -0800
From: "H. Peter Anvin" <hpa@...or.com>
To: Daniel P. Berrangé <berrange@...hat.com>,
        "Jason A. Donenfeld" <Jason@...c4.com>
CC: Dave Hansen <dave.hansen@...el.com>,
        "Reshetova, Elena" <elena.reshetova@...el.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "x86@...nel.org" <x86@...nel.org>, "Theodore Ts'o" <tytso@....edu>,
        Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        "Nakajima, Jun" <jun.nakajima@...el.com>,
        Tom Lendacky <thomas.lendacky@....com>,
        "Kalra, Ashish" <ashish.kalra@....com>,
        Sean Christopherson <seanjc@...gle.com>,
        "linux-coco@...ts.linux.dev" <linux-coco@...ts.linux.dev>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] x86/random: Issue a warning if RDRAND or RDSEED fails

On January 30, 2024 10:05:59 AM PST, "Daniel P. Berrangé" <berrange@...hat.com> wrote:
>On Tue, Jan 30, 2024 at 06:49:15PM +0100, Jason A. Donenfeld wrote:
>> On Tue, Jan 30, 2024 at 6:32 PM Dave Hansen <dave.hansen@...el.com> wrote:
>> >
>> > On 1/30/24 05:45, Reshetova, Elena wrote:
>> > >> You're the Intel employee so you can find out about this with much
>> > >> more assurance than me, but I understand the sentence above to be _way
>> > >> more_ true for RDRAND than for RDSEED. If your informed opinion is,
>> > >> "RDRAND failing can only be due to totally broken hardware"
>> > > No, this is not the case per Intel SDM. I think we can live under a simple
>> > > assumption that both of these instructions can fail not just due to broken
>> > > HW, but also due to enough pressure put into the whole DRBG construction
>> > > that supplies random numbers via RDRAND/RDSEED.
>> >
>> > I don't think the SDM is the right thing to look at for guidance here.
>> >
>> > Despite the SDM allowing it, we (software) need RDRAND/RDSEED failures
>> > to be exceedingly rare by design.  If they're not, we're going to get
>> > our trusty torches and pitchforks and go after the folks who built the
>> > broken hardware.
>> >
>> > Repeat after me:
>> >
>> >         Regular RDRAND/RDSEED failures only occur on broken hardware
>> >
>> > If it's nice hardware that's gone bad, then we WARN() and try to make
>> > the best of it.  If it turns out that WARN() was because of a broken
>> > hardware _design_ then we go sharpen the pitchforks.
>> >
>> > Anybody disagree?
>> 
>> Yes, I disagree. I made a trivial test that shows RDSEED breaks easily
>> in a busy loop. So at the very least, your statement holds true only
>> for RDRAND.
>> 
>> But, anyway, if the statement "RDRAND failures only occur on broken
>> hardware" is true, then a WARN() in the failure path there presents no
>> DoS potential of any kind, and so that's a straightforward conclusion
>> to this discussion. However, that really hinges on  "RDRAND failures
>> only occur on broken hardware" being a true statement.
>
>There's a useful comment here from an Intel engineer
>
>https://web.archive.org/web/20190219074642/https://software.intel.com/en-us/blogs/2012/11/17/the-difference-between-rdrand-and-rdseed
>
>  "RDRAND is, indeed, faster than RDSEED because it comes
>   from a hardware-based pseudorandom number generator.
>   One seed value (effectively, the output from one RDSEED
>   command) can provide up to 511 128-bit random values
>   before forcing a reseed"
>
>We know we can exhaust RDSEED directly pretty trivially. Making your
>test program run in parallel across 20 cpus, I got a mere 3% success
>rate from RDSEED.
>
>If RDRAND is reseeding every 511 values, RDRAND output would have
>to be consumed significantly faster than RDSEED in order that the
>reseed will happen frequently enough to exhaust the seeds.
>
>This looks pretty hard, but maybe with a large enough CPU count
>this will be possible in extreme load ?
>
>So I'm not convinced we can blindly wave away RDRAND failures as
>guaranteed to mean broken hardware.
>
>With regards,
>Daniel

The general approach has been "don't credit entropy and try again on the next interrupt." We can, of course, be much more aggressive during boot.

We only need 256-512 bits for the kernel random pool to be equivalent to breaking mainstream crypto primitives even if it is a PRNG only from that point on (which is extremely unlikely.) The Linux PRNG has a very large state, which helps buffer entropy variations.

Again, applications *should* be using /dev/[u]random as appropriate, and if they opt to use lower level primitives in user space they need to implement them correctly – there is literally nothing the kernel can do at that point.

If the probability of success is 3% per your CPU that is still 2 bits of true entropy per invocation. However, the probability of failure after 16 loops is over 60%. I think this validates the concept of continuing to poll periodically rather than looping in time critical paths.