[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRyppb8PCxFKVphr@zx2c4.com>
Date: Tue, 18 Nov 2025 18:15:17 +0100
From: "Jason A. Donenfeld" <Jason@...c4.com>
To: Arnd Bergmann <arnd@...db.de>
Cc: Ryan Roberts <ryan.roberts@....com>, Kees Cook <kees@...nel.org>,
Ard Biesheuvel <ardb@...nel.org>,
Jeremy Linton <jeremy.linton@....com>,
Will Deacon <will@...nel.org>,
Catalin Marinas <Catalin.Marinas@....com>,
Mark Rutland <mark.rutland@....com>,
"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [DISCUSSION] kstack offset randomization: bugs and performance
On Mon, Nov 17, 2025 at 05:47:05PM +0100, Arnd Bergmann wrote:
> On Mon, Nov 17, 2025, at 12:31, Ryan Roberts wrote:
> > On 17/11/2025 11:30, Ryan Roberts wrote:
> >> Hi All,
> >>
> >> Over the last few years we had a few complaints that syscall performance on
> >> arm64 is slower than x86. Most recently, it was observed that a certain Java
> >> benchmark that does a lot of fstat and lseek is spending ~10% of it's time in
> >> get_random_u16(). Cue a bit of digging, which led me to [1] and also to some new
> >> ideas about how performance could be improved.
>
>
> >> I believe this helps the mean latency significantly without sacrificing any
> >> strength. But it doesn't reduce the tail latency because we still have to call
> >> into the crng eventually.
> >>
> >> So here's another idea: Could we use siphash to generate some random bits? We
> >> would generate the secret key at boot using the crng. Then generate a 64 bit
> >> siphash of (cntvct_el0 ^ tweak) (where tweak increments every time we generate a
> >> new hash). As long as the key remains secret, the hash is unpredictable.
> >> (perhaps we don't even need the timer value). For every hash we get 64 bits, so
> >> that would last for 10 syscalls at 6 bits per call. So we would still have to
> >> call siphash every 10 syscalls, so there would still be a tail, but from my
> >> experiements, it's much less than the crng:
>
> IIRC, Jason argued against creating another type of prng inside of the
> kernel for a special purpose.
Yes indeed... I'm really not a fan of adding bespoke crypto willynilly
like that. Let's make get_random_u*() faster. If you're finding that the
issue with it is the locking, and that you're calling this from irq
context anyway, then your proposal (if I read this discussion correctly)
to add a raw_get_random_u*() seems like it could be sensible. Those
functions are generated via macro anyway, so it wouldn't be too much to
add the raw overloads. Feel free to send a patch to my random.git tree
if you'd like to give that a try.
Jason
Powered by blists - more mailing lists