[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YtqIbrds53EuyqPE@zx2c4.com>
Date: Fri, 22 Jul 2022 13:22:22 +0200
From: "Jason A. Donenfeld" <Jason@...c4.com>
To: Holger Dengler <dengler@...ux.ibm.com>
Cc: linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linuxppc-dev@...ts.ozlabs.org, linux-s390@...r.kernel.org,
x86@...nel.org, Will Deacon <will@...nel.org>,
Michael Ellerman <mpe@...erman.id.au>,
Alexander Gordeev <agordeev@...ux.ibm.com>,
Thomas Gleixner <tglx@...utronix.de>,
"H . Peter Anvin" <hpa@...or.com>,
Catalin Marinas <catalin.marinas@....com>,
Borislav Petkov <bp@...e.de>,
Heiko Carstens <hca@...ux.ibm.com>,
Johannes Berg <johannes@...solutions.net>,
Harald Freudenberger <freude@...ux.ibm.com>
Subject: Re: [PATCH v2] random: handle archrandom in plural words
Hi Holger,
On Fri, Jul 22, 2022 at 10:08:05AM +0200, Holger Dengler wrote:
> Why not changing the API to take bytes instead of words? Sure, at the
> moment it looks like all platforms with TRNG support are able to
> deliver at least one word, but bytes would be more flexible.
The idea is to strike a sweet spot between capabilities. S390x is fine
with byte-level granularity up to arbitrary lengths, while x86 is best
with word-level granularity of length 1. The happy intersection between
the two is just word-level granularity of arbitrary length. Yes we
_could_ introduce a lot of code complexity by cascading the x86 case
down into smaller and smaller registers, ignoring the fact that it's no
longer efficient below 32- or 64-bit registers depending on vendor. But
then we're relying on the inliner to remove all of that extra code,
since all callers actually only ever want 32 or 64 bytes. Why bloat for
nothing? The beauty of this approach is that it translates very
naturally over all the various quirks of architectures without having to
have a lot of coupling code.
The other reason is that it's simply not necessary. The primary use for
this in random.c is to fill a 32- or 64-*byte* block with "some stuff",
preferring RDSEED, then RDRAND, and finally falling back to RDTSC. These
correspond with arch_get_random_seed_longs(), arch_get_random_longs(),
and random_get_entropy() (which is usually get_cycles() underneath),
respectively. With the cycle counter being (at least) ~word-sized on all
platforms, keeping the granularity of the arch_get_random_*_longs()
functions the same lets us fill these with a basic cascade that doesn't
require a lot of code:
unsigned long array[whatever];
for (i = 0; i < ARRAY_SIZE(array);) {
longs = arch_get_random_seed_longs(&array[i], ARRAY_SIZE(array) - i);
if (longs) {
i += longs;
continue;
}
longs = arch_get_random_longs(&array[i], ARRAY_SIZE(array) - i);
if (longs) {
i += longs;
continue;
}
array[i++] = random_get_entropy();
}
By using a word as the underlying unit, the above cascade generates
optimal code on basically all archrandom platforms, no matter what their
byte-vs-word or one-vs-three-vs-many semantics are.
That's a bit long winded, but hopefully that gives a bit of insight on
why going from _long -> _longs is so "lazy" looking.
Jason
Powered by blists - more mailing lists