[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <82426a0d-df58-476d-b6e4-64b1415bf61e@arm.com>
Date: Mon, 17 Nov 2025 17:23:52 +0000
From: Ryan Roberts <ryan.roberts@....com>
To: Arnd Bergmann <arnd@...db.de>, Kees Cook <kees@...nel.org>,
Ard Biesheuvel <ardb@...nel.org>, Jeremy Linton <jeremy.linton@....com>,
Will Deacon <will@...nel.org>, Catalin Marinas <Catalin.Marinas@....com>,
Mark Rutland <mark.rutland@....com>
Cc: "linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"Jason A. Donenfeld" <Jason@...c4.com>
Subject: Re: [DISCUSSION] kstack offset randomization: bugs and performance
On 17/11/2025 16:47, Arnd Bergmann wrote:
> On Mon, Nov 17, 2025, at 12:31, Ryan Roberts wrote:
>> On 17/11/2025 11:30, Ryan Roberts wrote:
>>> Hi All,
>>>
>>> Over the last few years we had a few complaints that syscall performance on
>>> arm64 is slower than x86. Most recently, it was observed that a certain Java
>>> benchmark that does a lot of fstat and lseek is spending ~10% of it's time in
>>> get_random_u16(). Cue a bit of digging, which led me to [1] and also to some new
>>> ideas about how performance could be improved.
>
>
>>> I believe this helps the mean latency significantly without sacrificing any
>>> strength. But it doesn't reduce the tail latency because we still have to call
>>> into the crng eventually.
>>>
>>> So here's another idea: Could we use siphash to generate some random bits? We
>>> would generate the secret key at boot using the crng. Then generate a 64 bit
>>> siphash of (cntvct_el0 ^ tweak) (where tweak increments every time we generate a
>>> new hash). As long as the key remains secret, the hash is unpredictable.
>>> (perhaps we don't even need the timer value). For every hash we get 64 bits, so
>>> that would last for 10 syscalls at 6 bits per call. So we would still have to
>>> call siphash every 10 syscalls, so there would still be a tail, but from my
>>> experiements, it's much less than the crng:
>
> IIRC, Jason argued against creating another type of prng inside of the
> kernel for a special purpose.
siphash is already supported by the kernel; siphash_1u64() and friends. So I
think you could argue this is just creative use of an existing crypto primitive? ;-)
>
> As I understand, the other architectures already just use the cycle counter
> because that is random enough, but for arm64 the cntvct runs on an
> unspecified frequency that is often too low.
>
> However, most future machines are ARMv9.1 or higher and require a 1GHz
> timer frequency. I also checked Graviton-3 (Neoverse-V1, ARMv8.4), which
> is running its timer at 1.05GHz.
>
> My M2 Mac is running at a slower 24MHz timer. Between two getpid()
> syscalls, I see cntvct_el0 advance between 20 and 70 cycles, which
> still gives a few bits of entropy but not the six bits we actually
> want to use.
>
> How about we just check the timer frequency at boot and patch out the
> get_random_u16 call for a cntvct read if it gets updated fast enough?
> That would at least take care of the overhead on most new designs and
> hopefully on a large subset of the servers that are in active use.
That certainly sounds simple and reasonable to me (as a non-security guy). My
earlier optimizations would still help the non-conformant systems with mean latency.
We would then end up where arm64 can be just as performant as the other arches
for the same level of security. If that level of security is deemed
insufficient, then a new option to use get_random_u16() (or siphash) could be
introduced, which would be arch-independent.
Thanks,
Ryan
>
> Arnd
Powered by blists - more mailing lists