[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <dd5b4423-0954-44d5-99a5-0052b62c55af@app.fastmail.com>
Date: Mon, 17 Nov 2025 17:47:05 +0100
From: "Arnd Bergmann" <arnd@...db.de>
To: "Ryan Roberts" <ryan.roberts@....com>, "Kees Cook" <kees@...nel.org>,
"Ard Biesheuvel" <ardb@...nel.org>, "Jeremy Linton" <jeremy.linton@....com>,
"Will Deacon" <will@...nel.org>, "Catalin Marinas" <Catalin.Marinas@....com>,
"Mark Rutland" <mark.rutland@....com>
Cc:
"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>,
"Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>,
"Jason A. Donenfeld" <Jason@...c4.com>
Subject: Re: [DISCUSSION] kstack offset randomization: bugs and performance
On Mon, Nov 17, 2025, at 12:31, Ryan Roberts wrote:
> On 17/11/2025 11:30, Ryan Roberts wrote:
>> Hi All,
>>
>> Over the last few years we had a few complaints that syscall performance on
>> arm64 is slower than x86. Most recently, it was observed that a certain Java
>> benchmark that does a lot of fstat and lseek is spending ~10% of it's time in
>> get_random_u16(). Cue a bit of digging, which led me to [1] and also to some new
>> ideas about how performance could be improved.
>> I believe this helps the mean latency significantly without sacrificing any
>> strength. But it doesn't reduce the tail latency because we still have to call
>> into the crng eventually.
>>
>> So here's another idea: Could we use siphash to generate some random bits? We
>> would generate the secret key at boot using the crng. Then generate a 64 bit
>> siphash of (cntvct_el0 ^ tweak) (where tweak increments every time we generate a
>> new hash). As long as the key remains secret, the hash is unpredictable.
>> (perhaps we don't even need the timer value). For every hash we get 64 bits, so
>> that would last for 10 syscalls at 6 bits per call. So we would still have to
>> call siphash every 10 syscalls, so there would still be a tail, but from my
>> experiements, it's much less than the crng:
IIRC, Jason argued against creating another type of prng inside of the
kernel for a special purpose.
As I understand, the other architectures already just use the cycle counter
because that is random enough, but for arm64 the cntvct runs on an
unspecified frequency that is often too low.
However, most future machines are ARMv9.1 or higher and require a 1GHz
timer frequency. I also checked Graviton-3 (Neoverse-V1, ARMv8.4), which
is running its timer at 1.05GHz.
My M2 Mac is running at a slower 24MHz timer. Between two getpid()
syscalls, I see cntvct_el0 advance between 20 and 70 cycles, which
still gives a few bits of entropy but not the six bits we actually
want to use.
How about we just check the timer frequency at boot and patch out the
get_random_u16 call for a cntvct read if it gets updated fast enough?
That would at least take care of the overhead on most new designs and
hopefully on a large subset of the servers that are in active use.
Arnd
Powered by blists - more mailing lists