linux-kernel - Re: [DISCUSSION] kstack offset randomization: bugs and performance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a5afc696-b288-433c-8fda-12b9126c02f1@arm.com>
Date: Thu, 27 Nov 2025 11:50:04 +0000
From: Ryan Roberts <ryan.roberts@....com>
To: Kees Cook <kees@...nel.org>, Ard Biesheuvel <ardb@...nel.org>
Cc: Will Deacon <will@...nel.org>, Arnd Bergmann <arnd@...db.de>,
 Jeremy Linton <jeremy.linton@....com>,
 Catalin Marinas <Catalin.Marinas@....com>,
 Mark Rutland <mark.rutland@....com>,
 "linux-arm-kernel@...ts.infradead.org"
 <linux-arm-kernel@...ts.infradead.org>,
 Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [DISCUSSION] kstack offset randomization: bugs and performance

On 27/11/2025 08:00, Kees Cook wrote:
> On Wed, Nov 26, 2025 at 11:58:40PM +0100, Ard Biesheuvel wrote:
>> On Tue, 25 Nov 2025 at 12:15, Ryan Roberts <ryan.roberts@....com> wrote:
>>>
>>> On 24/11/2025 20:51, Kees Cook wrote:
>>>> On Mon, Nov 24, 2025 at 05:50:14PM +0000, Ryan Roberts wrote:
>>>>> On 24/11/2025 17:11, Kees Cook wrote:
>>>>>>
>>>>>>
>>>>>> On November 24, 2025 6:36:25 AM PST, Will Deacon <will@...nel.org> wrote:
>>>>>>> On Mon, Nov 17, 2025 at 11:31:22AM +0000, Ryan Roberts wrote:
>>>>>>>> On 17/11/2025 11:30, Ryan Roberts wrote:
>>>>>>>>> Could this give us a middle ground between strong-crng and
>>>>>>>>> weak-timestamp-counter? Perhaps the main issue is that we need to store the
>>>>>>>>> secret key for a long period?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Anyway, I plan to work up a series with the bugfixes and performance
>>>>>>>>> improvements. I'll add the siphash approach as an experimental addition and get
>>>>>>>>> some more detailed numbers for all the options. But wanted to raise it all here
>>>>>>>>> first to get any early feedback.
>>>>>>>
>>>>>>> FWIW, I share Mark's concerns about using a counter for this. Given that
>>>>>>> the feature currently appears to be both slow _and_ broken I'd vote for
>>>>>>> either removing it or switching over to per-thread offsets as a first
>>>>>>> step.
>>>>>>
>>>>>> That it has potential weaknesses doesn't mean it should be entirely removed.
>>>>>>
>>>>>>> We already have a per-task stack canary with
>>>>>>> CONFIG_STACKPROTECTOR_PER_TASK so I don't understand the reluctance to
>>>>>>> do something similar here.
>>>>>>
>>>>>> That's not a reasonable comparison: the stack canary cannot change arbitrarily for a task or it would immediately crash on a function return. :)
>>>>>>
>>>>>>> Speeding up the crypto feels like something that could happen separately.
>>>>>>
>>>>>> Sure. But let's see what Ryan's patches look like. The suggested changes sound good to me.
>>>>>
>>>>> Just to say I haven't forgotten about this; I ended up having to switch to
>>>>> something more urgent. Hoping to get back to it later this week. I don't think
>>>>> this is an urgent issue, so hopefully folks are ok waiting.
>>>>>
>>>>> I propose to post whatever I end up with then we can all disscuss from there.
>>>>> But the rough shape so far:
>>>>>
>>>>> Fixes:
>>>>>  - Remove choose_random_kstack_offset()
>>>>>  - arch passes random into add_random_kstack_offset() (fixes migration bypass)
>>>>>  - Move add_random_kstack_offset() to el0_svc()/el0_svc_compat() (before
>>>>>    enabling interrupts) to fix non-preemption requirement (arm64)
>>>>
>>>> I thought we'd keep choose_random_kstack_offset() and just move
>>>> everything into a per-task location? (And for arm64 only)
>>>
>>> Err... I thought you were the one arguing against per-task state? I'm not really
>>> keen on having arm64 do a completely different thing to everyone else; It seems
>>> reasonable to me that we only need to (continue to) abstract the random source
>>> per-arch and the rest should remain common?
>>>
>>> Per my previous mails, I'm not really sure what choose_random_kstack_offset() is
>>> giving us in practice. Why not simplify?
>>>
>>
>> It seems to me that arm64 inherited a lot of complexity from other
>> architectures that rely on a counter. AIUI, the 'non-preemption
>> requirement' and the 'migration bypass' issues are both a result of
>> the fact that the syscall duration is relied upon to provide more
>> pseudo-randomness, which is recorded in a per-CPU counter when the
>> syscall completes and carried over to the next invocation. (I don't
>> really buy the idea that having the next offset in memory for a
>> shorter time buys us anything in terms of robustness tbh)

Agreed.

>>
>> I have some patches that make get_random_uXX()'s fast path lockless
>> that I will send out shortly (assuming no hate mail from the bots in
>> the mean time), which should make it suitable for general use as the
>> entropy source for kstack randomization on all architectures, modulo
>> the tail latency issue, but I'm not sure I understand why that is a
>> problem to begin with if it occurs sufficiently rarely. Is that a
>> PREEMPT_RT issue? 

Yes; RT was Jeremy's original motivation for looking at the prng approach.

For the issue I see, improving the mean would be sufficient, but improving the
tail too is a bonus.

>> Would it be better if the refill of the per-CPU
>> batched entropy buffers was relegated to some kind of kthread so it
>> can be scheduled independently? (Those buffers are all the same size
>> so we could easily keep a few hot spares)

That came up in Jeremy's thread last year. My understanding was that this would
not help because either the thread is lower priority, in which case you can't
guarrantee it will run, or it is higher priority, in which case the RT thread
still takes the glitch. (But I'm hand waving - I'm not expert on the details).

>>
>> TL;DR I agree with Ryan but I'd like to fix it across the board, i.e.,
>> get rid of the per-CPU variable entirely and all the dodgy uses of
>> counters.
> 
> Right, the whole design of picking the next random number after the
> syscall is to try to reduce the control an attacker would have given an
> instruction counter is the randomness source in most cases (during the
> original design no one could agree on a fast enough randomness source).
> 
> If we can fix that then we can fix this for all architectures and all
> randomness users generally.
> 
> -Kees
>