[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMj1kXEJSmYXPNiHO4woBE8rVFOxkfuKOJ9JGacVC76GqkqY+w@mail.gmail.com>
Date: Thu, 27 Nov 2025 13:19:48 +0100
From: Ard Biesheuvel <ardb@...nel.org>
To: Ryan Roberts <ryan.roberts@....com>
Cc: Kees Cook <kees@...nel.org>, Will Deacon <will@...nel.org>, Arnd Bergmann <arnd@...db.de>,
Jeremy Linton <jeremy.linton@....com>, Catalin Marinas <Catalin.Marinas@....com>,
Mark Rutland <mark.rutland@....com>,
"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [DISCUSSION] kstack offset randomization: bugs and performance
On Thu, 27 Nov 2025 at 12:50, Ryan Roberts <ryan.roberts@....com> wrote:
>
> On 27/11/2025 08:00, Kees Cook wrote:
> > On Wed, Nov 26, 2025 at 11:58:40PM +0100, Ard Biesheuvel wrote:
...
> >> the tail latency issue, but I'm not sure I understand why that is a
> >> problem to begin with if it occurs sufficiently rarely. Is that a
> >> PREEMPT_RT issue?
>
> Yes; RT was Jeremy's original motivation for looking at the prng approach.
>
> For the issue I see, improving the mean would be sufficient, but improving the
> tail too is a bonus.
>
> >> Would it be better if the refill of the per-CPU
> >> batched entropy buffers was relegated to some kind of kthread so it
> >> can be scheduled independently? (Those buffers are all the same size
> >> so we could easily keep a few hot spares)
>
> That came up in Jeremy's thread last year. My understanding was that this would
> not help because either the thread is lower priority, in which case you can't
> guarrantee it will run, or it is higher priority, in which case the RT thread
> still takes the glitch. (But I'm hand waving - I'm not expert on the details).
>
PREEMPT_RT is generally more concerned about the worst case latency
being bounded rather than being as low as possible.
The get_random fallback runs a few rounds of chacha20, which takes
more time than just reading the next value and bumping the position
counter. But that does not imply it fails to meet RT constraints.
And if a thread running ChaCha20 in the background fails to get enough
cycles, it is not an RT problem, it is an ordinary starvation problem,
which can only be achieved by doing less work in total. But cranking
prandom_u32_state() on every syscall is not free either.
In summary, it would be good to have a better problem statement wrt RT
constraints before assuming that 99% tail latency is something to
obsess about, especially given the fact het getpid() is not that
representative a syscall to begin with.
Powered by blists - more mailing lists