linux-kernel - Re: [PATCH 2/2] x86/random: Issue a warning if RDRAND or RDSEED fails

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZcItU5FKlIVEEVte@redhat.com>
Date: Tue, 6 Feb 2024 13:00:03 +0000
From: Daniel P. Berrangé <berrange@...hat.com>
To: "Dr. Greg" <greg@...ellic.com>
Cc: "Reshetova, Elena" <elena.reshetova@...el.com>,
	"Jason A. Donenfeld" <Jason@...c4.com>,
	"Hansen, Dave" <dave.hansen@...el.com>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"H. Peter Anvin" <hpa@...or.com>, "x86@...nel.org" <x86@...nel.org>,
	Theodore Ts'o <tytso@....edu>,
	Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>,
	"Nakajima, Jun" <jun.nakajima@...el.com>,
	Tom Lendacky <thomas.lendacky@....com>,
	"Kalra, Ashish" <ashish.kalra@....com>,
	Sean Christopherson <seanjc@...gle.com>,
	"linux-coco@...ts.linux.dev" <linux-coco@...ts.linux.dev>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] x86/random: Issue a warning if RDRAND or RDSEED fails

On Tue, Feb 06, 2024 at 06:04:45AM -0600, Dr. Greg wrote:
> On Tue, Feb 06, 2024 at 08:04:57AM +0000, Daniel P. Berrang?? wrote:
> 
> Good morning to everyone.
> 
> > On Mon, Feb 05, 2024 at 07:12:47PM -0600, Dr. Greg wrote:
> > > 
> > > Actually, I now believe there is clear evidence that the problem is
> > > indeed Intel specific.  In light of our testing, it will be
> > > interesting to see what your 'AR' returns with respect to an official
> > > response from Intel engineering on this issue.
> > > 
> > > One of the very bright young engineers collaborating on Quixote, who
> > > has been following this conversation, took it upon himself to do some
> > > very methodical engineering analysis on this issue.  I'm the messenger
> > > but this is very much his work product.
> > > 
> > > Executive summary is as follows:
> > > 
> > > - No RDRAND depletion failures were observable with either the Intel
> > >   or AMD hardware that was load tested.
> > > 
> > > - RDSEED depletion is an Intel specific issue, AMD's RDSEED
> > >   implementation could not be provoked into failure.
> 
> > My colleague ran a multithread parallel stress test program on his
> > 16core/2HT AMD Ryzen (Zen4 uarch) and saw a 80% failure rate in
> > RDSEED.
> 
> Interesting datapoint, thanks for forwarding it along, so the issue
> shows up on at least some AMD platforms as well.
> 
> On the 18 core/socket Intel Skylake platform, the parallelized
> depletion test forces RDSEED success rates down to around 2%.  It
> would appear that your tests suggest that the AMD platform fairs
> better than the Intel platform.

Yes, given the speed of the AMD RDRAND/RDSEED ops, compared to my
Intel test platforms, their DRBG looks better able to keep up with
the demand for bits.

> Of course, the other variable may be how the parallelized stress test
> is conducted.  If you would like to share your implementation source
> we could give it a twirl on the systems we have access to.

It is just Jason's earlier test program, but moved into one thread
for each core....

$ cat cpurngstress.c
#include <stdio.h>
#include <immintrin.h>
#include <pthread.h>
#include <unistd.h>

/*
 * Gives about 25 seconds walllock time on my Alderlake CPU
 *
 * Probably want to reduce this x10, or possibly even x100
 * on AMD due to much slower ops.
 */
#define MAX_ITER 10000000

#define MAX_CPUS 4096

void *doit(void *f) {
    unsigned long long rand;
    unsigned int i, success_rand = 0, success_seed = 0;

    for (i = 0; i < MAX_ITER; ++i) {
        success_seed += !!_rdseed64_step(&rand);
    }
    for (i = 0; i < MAX_ITER; ++i) {
        success_rand += !!_rdrand64_step(&rand);
    }

    fprintf(stderr,
	    "RDRAND: %.2f%%, RDSEED: %.2f%%\n",
	    success_rand * 100.0 / MAX_ITER,
	    success_seed * 100.0 / MAX_ITER);

    return NULL;
}


int main(int argc, char *argv[])
{
    pthread_t th[MAX_CPUS];
    int nproc = sysconf(_SC_NPROCESSORS_ONLN);
    if (nproc > MAX_CPUS) {
      nproc = MAX_CPUS;
    }
    fprintf(stderr, "Stressing RDRAND/RDSEED across %d CPUs\n", nproc);

    for (int i = 0 ; i < nproc;i ++) {
      pthread_create(&th[i], NULL, doit,NULL);
    }

    for (int i = 0 ; i < nproc;i ++) {
      pthread_join(th[i], NULL);
    }

    return 0;
}

$ gcc -march=native -o cpurngstress cpurngstress.c


> If there is the possibility of over-harvesting randomness, why not
> design the implementations to be clamped at some per core value such
> as a megabit/second.  In the case of the documented RDSEED generation
> rates, that would allow the servicing of 3222 cores, if my math at
> 0530 in the morning is correct.
> 
> Would a core need more than 128 kilobytes of randomness, ie. one
> second of output, to effectively seed a random number generator?
> 
> A cynical conclusion would suggest engineering acquiesing to marketing
> demands... :-)

My assumption is that it was simply easier to not implement a
rate limiting feature at the CPU level and punt the starvation
problem to software :-)

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|