linux-kernel - Re: [PATCH RFC v1] random: implement getrandom() in vDSO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YuRZ4tC+GY+hymFd@zx2c4.com>
Date:   Sat, 30 Jul 2022 00:06:26 +0200
From:   "Jason A. Donenfeld" <Jason@...c4.com>
To:     Florian Weimer <fweimer@...hat.com>
Cc:     linux-kernel@...r.kernel.org, linux-crypto@...r.kernel.org,
        x86@...nel.org, Nadia Heninger <nadiah@...ucsd.edu>,
        Thomas Ristenpart <ristenpart@...nell.edu>,
        Theodore Ts'o <tytso@....edu>,
        Vincenzo Frascino <vincenzo.frascino@....com>,
        Adhemerval Zanella Netto <adhemerval.zanella@...aro.org>
Subject: Re: [PATCH RFC v1] random: implement getrandom() in vDSO

Hey Florian,

Thanks for the feedback.

On Fri, Jul 29, 2022 at 10:19:05PM +0200, Florian Weimer wrote:
> > +	if (getcpu(&start, NULL, NULL) == 0)
> > +		start %= NUM_BUCKETS;
> 
> getcpu is not available everywhere.  Userspace/libc should probably
> provide a CPU number hint as an additional argument during the vDSO
> call.  We can load that easily enough from rseq.  That's going to be
> faster on x86, too (the LSL instruction is quite slow).  The only
> advantage of using getcpu like this is that it's compatible with a libc
> that isn't rseq-enabled.

Actually, the only requirement is that it's somewhat stable and somehow
separates threads most of the time. So a per-thread ID or even a
per-thread address would work fine too. Adhemerval suggested on IRC this
afternoon that there's a thread pointer register value that would be
usable for this purpose. I think what I'll do for v2 is abstract this
out to a __arch_get_bucket_hint() function, or similar, which the
different archs can fill in.

> > +	for (i = start;;) {
> > +		struct getrandom_state *state = &buckets[i];
> > +
> > +		if (cmpxchg(&state->in_use, false, true) == false)
> > +			return state;
> > +
> > +		i = i == NUM_BUCKETS - 1 ? 0 : i + 1;
> > +		if (i == start)
> > +			break;
> > +	}
> 
> Surely this scales very badly once the number of buckets is smaller than
> the system processor count?

Right, and there are a few ways that observation can go:

1) It doesn't matter, because who has > 28 threads all churning at once
   here? Is that something real?

2) The state variable is controllable by userspace, so in theory
   different ones could be passed. I don't like this idea though - hard
   to manage and not enough information to do it well.

3) Since we know when this kind of contention is hit, it should be
   possible to just expand the map size. Seems a bit complicated.

4) Simply allocate a number of pages relative to the number of CPUs, so
   that this isn't actually a problem. This seems like the simplest
   approach and seems fine.

Jason