[<prev] [next>] [day] [month] [year] [list]
Message-ID: <23483431.vTCxPXJkl2@wukong>
Date: Tue, 22 Oct 2024 22:49:54 +0200
From: Bernd Paysan <bernd@...2o.de>
To: linux-kernel@...r.kernel.org
Cc: "Jason A. Donenfeld" <Jason@...c4.com>,
David Hildenbrand <david@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>, Yu Zhao <yu.zhao@...el.com>,
Adhemerval Zanella Netto <adhemerval.zanella@...aro.org>
Subject: [PATCH v23] getrandom() in vDSO (still flawed)
This recently popped up here (I'm not a Linux kernel developer, but do a
secure network protocol (net2o) and also did a hardware CPRNG for cryptech.is)
after the author of dietlibc had a look at getrandom() in vDSO. I kindly
helped him to review this. We concluded that he should ATM NOT implement it,
in particular, because it is not just the normal easy stuff where you just link
to the exact same symbol with the same signature in the vDSO lib instead of
doing a kernel call, but this is putting additional (and fragile!) burden on
the libc maintainer of managing an opaque state.
Random numbers are too important, so I subscribed to LKML… and jump in with my
flame-proof suit on.
CC to everybody who replied to [Patch v22] of this proposal, and the person
who wrote the glibc patch.
Linus Torvalds wrote:
> In other words, I want to see actual *users* piping up and saying "this is a
> problem, here's my real load that spends 10% of time on getrandom(), and
> this fixes it".
# My usage #
In my net2o crypto stack, I use getrandom() once at the process start, and
roll my own CPRNG. Even if Linux kernels had a superfast getrandom(), doing
this backward compatible and across OS portable requires to roll my own CPRNG
for at least the next decade. I still check if getrandom is actually
available, and read /dev/urandom if not. I feel comfortable generating
randomness myself, but I understand that others don't and shouldn't.
# When do you need lots and lots of randomness? #
For connection-oriented protocols, you can and should generate the nonces on
both sides starting with a shared secret, and not transmit them over the
network at all. Then, there's no per packet kernel getrandom() call involved,
but at most a per-connection call. Connection oriented protocols that require
a CPRNG nonce per packet are ill-conceived, but do exist, though. There's no
problem to encrypt 1GB/s on a single core, so calling getrandom() per packet
would be a clearly noticeable slowdown. There's sendmmsg/recvmmsg for sending
out and receiving multiple packets with one kernel call, and it is necessary.
Leaves connection-less protocols, where you can't avoid generating some random
stuff per packet; usually ephemeral keys (where you don't send the raw random
numbers over the network). Those protocols have a higher overhead, as they
need a per-packet key exchange. Still >100k/s of such DHEs can be done per
core with the recent speed improvements of x/ed25519, so the overhead is
measurable.
So while I don't need this, I'm not saying it is completely useless.
# There are however some flaws that need to be fixed first #
1. Burden on the libc implementer to allocate the state, re-initialize it on
fork() and whatnot. This is calling for disaster, there's certainly one or
the other maintainer who gets something wrong, and it's not just glibc, it's
also musl, bionic, eglibc, dietlibc, and programs that rather talk to the
kernel directly to avoid broken libcs. It really has to be getrandom(buf,
size, flags). That's how the other vDSO entries work, and that's how far I
trust libc maintainers. I don't trust them to maintain the state by
themselves.
2. There's no documentation. This would only be ok if it was a 1:1
replacement of the syscall that doesn't need documentation.
3. The ChaCha20 (good choice) based implementation requires more state than
necessary.
So, how much state do you need for a multi-threaded CPRNG, based on ChaCha20?
Answer:
1 *shared* secret key (256 or 512 bits) *per process* (yes, you can share that
secret)
1 64 bit counter and
1 64 bit nonce (essentially pthread_self) *per thread*
The per-thread 128 bits don't have to be secret. The counter can be shared or
per-thread. This means handing over the burden of allocating state to the
library maintainer is completely unnecessary.
# In summary: This can be fixed #
But ATM, C library maintainer should not touch it.
--
Bernd Paysan
"If you want it done right, you have to do it yourself"
net2o id: kQusJzA;7*?t=uy@...GWr!+0qqp_Cn176t4(dQ*
https://net2o.de/
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists