linux-kernel - Re: [PATCH 2/2] x86/random: Issue a warning if RDRAND or RDSEED fails

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240206120445.GA1247@wind.enjellic.com>
Date: Tue, 6 Feb 2024 06:04:45 -0600
From: "Dr. Greg" <greg@...ellic.com>
To: "Daniel P. Berrang??" <berrange@...hat.com>
Cc: "Reshetova, Elena" <elena.reshetova@...el.com>,
        "Jason A. Donenfeld" <Jason@...c4.com>,
        "Hansen, Dave" <dave.hansen@...el.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "H. Peter Anvin" <hpa@...or.com>, "x86@...nel.org" <x86@...nel.org>,
        "Theodore Ts'o" <tytso@....edu>,
        Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        "Nakajima, Jun" <jun.nakajima@...el.com>,
        Tom Lendacky <thomas.lendacky@....com>,
        "Kalra, Ashish" <ashish.kalra@....com>,
        Sean Christopherson <seanjc@...gle.com>,
        "linux-coco@...ts.linux.dev" <linux-coco@...ts.linux.dev>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] x86/random: Issue a warning if RDRAND or RDSEED fails

On Tue, Feb 06, 2024 at 08:04:57AM +0000, Daniel P. Berrang?? wrote:

Good morning to everyone.

> On Mon, Feb 05, 2024 at 07:12:47PM -0600, Dr. Greg wrote:
> > 
> > Actually, I now believe there is clear evidence that the problem is
> > indeed Intel specific.  In light of our testing, it will be
> > interesting to see what your 'AR' returns with respect to an official
> > response from Intel engineering on this issue.
> > 
> > One of the very bright young engineers collaborating on Quixote, who
> > has been following this conversation, took it upon himself to do some
> > very methodical engineering analysis on this issue.  I'm the messenger
> > but this is very much his work product.
> > 
> > Executive summary is as follows:
> > 
> > - No RDRAND depletion failures were observable with either the Intel
> >   or AMD hardware that was load tested.
> > 
> > - RDSEED depletion is an Intel specific issue, AMD's RDSEED
> >   implementation could not be provoked into failure.

> My colleague ran a multithread parallel stress test program on his
> 16core/2HT AMD Ryzen (Zen4 uarch) and saw a 80% failure rate in
> RDSEED.

Interesting datapoint, thanks for forwarding it along, so the issue
shows up on at least some AMD platforms as well.

On the 18 core/socket Intel Skylake platform, the parallelized
depletion test forces RDSEED success rates down to around 2%.  It
would appear that your tests suggest that the AMD platform fairs
better than the Intel platform.

So this is turning into even more of a morass, given that RDSEED
depletion on AMD may be a function of the micro-architecture the
platform is based on.  The other variable is that our AMD test
platform had a substantially higher core count per socket, one would
assume that would result in higher depletion rates, if the operative
theory of socket common RNG infrastructure is valid.

Unless AMD engineering understands the problem and has taken some type
of action on higher core count systems to address the issue.

Of course, the other variable may be how the parallelized stress test
is conducted.  If you would like to share your implementation source
we could give it a twirl on the systems we have access to.

The continuing operative question is whether or not any of this ever
leads to an RDRAND failure.

We've conducted some additional tests on the Intel platform where
RDSEED depletion was driven low as possible, ~1-2% success rates,
while RDRAND depletion tests were being run simultaneously.  No RDRAND
failures have been noted.

So the operative question remains, why worry about this if RDRAND is
used as the randomness primitive.

We haven't seen anything out of Intel yet on this, maybe AMD has a
quantifying definition for 'astronomical' when it comes to RDRAND
failures.

The silence appears to be deafening out of the respective engineering
camps... :-)

> > - AMD's RDRAND/RDSEED implementation is significantly slower than
> >   Intel's.

> Yes, we also noticed the AMD impl is horribly slow compared to
> Intel, had to cut test iterations x100.

The operative question is the impact of 'slow', in the absence of
artifical stress tests.

It would seem that a major question is what are or were the
engineering thought processes on the throughput of the hardware
randomness instructions.

Intel documents the following randomness throughput rates:

RDSEED: 3 Gbit/second
RDRAND: 6.4 Gbit/second

If there is the possibility of over-harvesting randomness, why not
design the implementations to be clamped at some per core value such
as a megabit/second.  In the case of the documented RDSEED generation
rates, that would allow the servicing of 3222 cores, if my math at
0530 in the morning is correct.

Would a core need more than 128 kilobytes of randomness, ie. one
second of output, to effectively seed a random number generator?

A cynical conclusion would suggest engineering acquiesing to marketing
demands... :-)

> With regards,
> Daniel

Have a good day.

As always,
Dr. Greg

The Quixote Project - Flailing at the Travails of Cybersecurity
              https://github.com/Quixote-Project