[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMj1kXHX1W2Kz5NJehtzDhOCKWEd6y1sp8+ViOMHGsq9e8=7Pw@mail.gmail.com>
Date: Tue, 2 Dec 2025 10:15:22 +0100
From: Ard Biesheuvel <ardb@...nel.org>
To: Ryan Roberts <ryan.roberts@....com>
Cc: Kees Cook <kees@...nel.org>, Ard Biesheuvel <ardb+git@...gle.com>, Will Deacon <will@...nel.org>,
Arnd Bergmann <arnd@...db.de>, Jeremy Linton <jeremy.linton@....com>,
Catalin Marinas <Catalin.Marinas@....com>, Mark Rutland <mark.rutland@....com>,
"Jason A . Donenfeld" <Jason@...c4.com>, linux-hardening@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v1 2/2] randomize_kstack: Unify random source across arches
On Mon, 1 Dec 2025 at 19:20, Ryan Roberts <ryan.roberts@....com> wrote:
>
> On 28/11/2025 11:01, Ard Biesheuvel wrote:
> > On Thu, 27 Nov 2025 at 12:00, Ryan Roberts <ryan.roberts@....com> wrote:
> >>
> >> Previously different architectures were using random sources of
> >> differing strength and cost to decide the random kstack offset. A number
> >> of architectures (loongarch, powerpc, s390, x86) were using their
> >> timestamp counter, at whatever the frequency happened to be. Other
> >> arches (arm64, riscv) were using entropy from the crng via
> >> get_random_u16().
> >>
> >> There have been concerns that in some cases the timestamp counters may
> >> be too weak, because they can be easily guessed or influenced by user
> >> space. And get_random_u16() has been shown to be too costly for the
> >> level of protection kstack offset randomization provides.
> >>
> >> So let's use a common, architecture-agnostic source of entropy; a
> >> per-task prng, seeded at fork-time from the crng. This has a few
> >> benefits:
> >>
> >> - We can remove choose_random_kstack_offset(); That was only there to
> >> try to make the timestamp counter value a bit harder to influence
> >> from user space.
> >>
> >> - The architecture code is simplified. All it has to do now is call
> >> add_random_kstack_offset() in the syscall path.
> >>
> >> - The strength of the randomness can be reasoned about independently
> >> of the architecture.
> >>
> >> - Arches previously using get_random_u16() now have much faster
> >> syscall paths, see below results.
> >>
> >> There have been some claims that a prng may be less strong than the
> >> timestamp counter if not regularly reseeded. But the prng has a period
> >> of about 2^113. So as long as the prng state remains secret, it should
> >> not be possible to guess. If the prng state can be accessed, we have
> >> bigger problems.
> >>
> >> Additionally, we are only consuming 6 bits to randomize the stack, so
> >> there are only 64 possible random offsets. I assert that it would be
> >> trivial for an attacker to brute force by repeating their attack and
> >> waiting for the random stack offset to be the desired one. The prng
> >> approach seems entirely proportional to this level of protection.
> >>
> >> Performance data are provided below. The baseline is v6.18-rc5 with
> >> rndstack on for each respective arch. (I)/(R) indicate statistically
> >> significant improvement/regression. arm64 platform is AWS Graviton3.
> >> x86_64 platform is AWS Sapphire Rapids:
> >>
> >> +-----------------+--------------+---------------+---------------+
> >> | Benchmark | Result Class | per-task-prng | per-task-prng |
> >> | | | arm64 | x86_64 |
> >> +=================+==============+===============+===============+
> >> | syscall/getpid | mean (ns) | (I) -10.54% | (I) -7.69% |
> >> | | p99 (ns) | (I) -59.53% | 4.14% |
> >> | | p99.9 (ns) | (I) -59.90% | 2.68% |
> >> +-----------------+--------------+---------------+---------------+
> >> | syscall/getppid | mean (ns) | (I) -10.49% | (I) -5.98% |
> >> | | p99 (ns) | (I) -59.83% | -3.11% |
> >> | | p99.9 (ns) | (I) -59.88% | (R) 9.84% |
> >> +-----------------+--------------+---------------+---------------+
> >> | syscall/invalid | mean (ns) | (I) -9.28% | (I) -6.94% |
> >> | | p99 (ns) | (I) -61.06% | (I) -5.57% |
> >> | | p99.9 (ns) | (I) -61.40% | (R) 10.53% |
> >> +-----------------+--------------+---------------+---------------+
> >>
> >> Signed-off-by: Ryan Roberts <ryan.roberts@....com>
> >> ---
> >> arch/Kconfig | 5 ++--
> >> arch/arm64/kernel/syscall.c | 11 -------
> >> arch/loongarch/kernel/syscall.c | 11 -------
> >> arch/powerpc/kernel/syscall.c | 12 --------
> >> arch/riscv/kernel/traps.c | 12 --------
> >> arch/s390/include/asm/entry-common.h | 8 ------
> >> arch/x86/include/asm/entry-common.h | 12 --------
> >> include/linux/prandom.h | 7 ++++-
> >> include/linux/randomize_kstack.h | 43 ++++++++--------------------
> >> include/linux/sched.h | 3 +-
> >> 10 files changed, 22 insertions(+), 102 deletions(-)
> >>
> >
> > I think this is a substantial improvement over the status quo,
> > especially with all the dodgy uses of monotonic counters with
> > unspecified granularity.
> >
> > So with two comments below:
> >
> > Reviewed-by: Ard Biesheuvel <ardb@...nel.org>
>
> Thanks!
>
> I shall plan to re-post without the RFC tag against -rc1 then, and we will see
> if anyone pushes back.
>
Sounds good.
...
> >> diff --git a/include/linux/prandom.h b/include/linux/prandom.h
> >> index f2ed5b72b3d6..827edde11cb9 100644
> >> --- a/include/linux/prandom.h
> >> +++ b/include/linux/prandom.h
> >> @@ -10,13 +10,18 @@
> >>
> >> #include <linux/types.h>
> >> #include <linux/once.h>
> >> -#include <linux/percpu.h>
> >> #include <linux/random.h>
> >>
> >> struct rnd_state {
> >> __u32 s1, s2, s3, s4;
> >> };
> >>
> >> +/*
> >> + * percpu.h includes sched.h, which requires struct rnd_state. So defer until
> >> + * after struct rnd_state is defined.
> >> + */
> >> +#include <linux/percpu.h>
> >> +
> >
> > Yuck. Is this the best we can do to disentangle this?
>
> I think I might be able to remove some stuff out of sched.h into alloc_tag.h
> (looks to me like it should have been there all along). And that will allow
> removing the sched.h include from percpu.h. I think that will solve it.
>
OK.
> >> diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
> >> index 089b1432f7e6..83c7e6710f6d 100644
> >> --- a/include/linux/randomize_kstack.h
> >> +++ b/include/linux/randomize_kstack.h
> >> @@ -6,6 +6,7 @@
> >> #include <linux/kernel.h>
> >> #include <linux/jump_label.h>
> >> #include <linux/percpu-defs.h>
> >> +#include <linux/prandom.h>
> >>
> >> DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> >> randomize_kstack_offset);
> >> @@ -45,9 +46,13 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> >> #define KSTACK_OFFSET_MAX(x) ((x) & 0b1111111100)
> >> #endif
> >>
> >> +static __always_inline u32 get_update_kstack_offset(void)
> >> +{
> >> + return prandom_u32_state(¤t->kstack_rnd_state);
>
> I've got bot warnings because this is being called from noinstr code. I guess
> the best option is to just move add_random_kstack_offset() to after
> instrumentation is enabled for the affected arches.
>
Just put instrumentation_begin()/instrumentation_end() around the call
to prandom_u32_state() - that seems to be the common approach for
punching holes into the 'noinstr' validation.
> >> +}
> >> +
> >> /**
> >> - * add_random_kstack_offset - Increase stack utilization by previously
> >> - * chosen random offset
> >> + * add_random_kstack_offset - Increase stack utilization by a random offset.
> >> *
> >> * This should be used in the syscall entry path after user registers have been
> >> * stored to the stack. Preemption may be enabled. For testing the resulting
> >> @@ -56,46 +61,22 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> >> #define add_random_kstack_offset() do { \
> >> if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
> >> &randomize_kstack_offset)) { \
> >> - u32 offset = current->kstack_offset; \
> >> + u32 offset = get_update_kstack_offset(); \
> >> u8 *ptr = __kstack_alloca(KSTACK_OFFSET_MAX(offset)); \
> >> /* Keep allocation even after "ptr" loses scope. */ \
> >> asm volatile("" :: "r"(ptr) : "memory"); \
> >> } \
> >> } while (0)
> >>
> >> -/**
> >> - * choose_random_kstack_offset - Choose the random offset for the next
> >> - * add_random_kstack_offset()
> >> - *
> >> - * This should only be used during syscall exit. Preemption may be enabled. This
> >> - * position in the syscall flow is done to frustrate attacks from userspace
> >> - * attempting to learn the next offset:
> >> - * - Maximize the timing uncertainty visible from userspace: if the
> >> - * offset is chosen at syscall entry, userspace has much more control
> >> - * over the timing between choosing offsets. "How long will we be in
> >> - * kernel mode?" tends to be more difficult to predict than "how long
> >> - * will we be in user mode?"
> >> - * - Reduce the lifetime of the new offset sitting in memory during
> >> - * kernel mode execution. Exposure of "thread-local" memory content
> >> - * (e.g. current, percpu, etc) tends to be easier than arbitrary
> >> - * location memory exposure.
> >> - */
> >> -#define choose_random_kstack_offset(rand) do { \
> >> - if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
> >> - &randomize_kstack_offset)) { \
> >> - u32 offset = current->kstack_offset; \
> >> - offset = ror32(offset, 5) ^ (rand); \
> >> - current->kstack_offset = offset; \
> >> - } \
> >> -} while (0)
> >> -
> >> static inline void random_kstack_task_init(struct task_struct *tsk)
> >> {
> >> - current->kstack_offset = 0;
> >> + if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> >> + &randomize_kstack_offset)) {
> >> + prandom_seed_state(&tsk->kstack_rnd_state, get_random_u64());
> >
> > We should either fix prandom_seed_state() not to truncate the u64 to
> > u32, or even better, refactor prandom_seed_full_state() so we can
> > reuse it here, and use a 128-bit seed directly.
>
> How about something like this:
>
Looks good to me, but it does make we wonder why a full prandom state
(i.e., one per cpu) isn't sufficient here, and why we are putting one
in every task. Is it just to avoid the overhead of dis/enabling
preemption on every syscall entry?
>
> ---8<---
> diff --git a/include/linux/prandom.h b/include/linux/prandom.h
> index f2ed5b72b3d6..9b651c9b3448 100644
> --- a/include/linux/prandom.h
> +++ b/include/linux/prandom.h
> @@ -19,10 +19,11 @@ struct rnd_state {
>
> u32 prandom_u32_state(struct rnd_state *state);
> void prandom_bytes_state(struct rnd_state *state, void *buf, size_t nbytes);
> -void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
> +void prandom_seed_full_state_one(struct rnd_state *state);
> +void prandom_seed_full_state_all(struct rnd_state __percpu *pcpu_state);
>
> #define prandom_init_once(pcpu_state) \
> - DO_ONCE(prandom_seed_full_state, (pcpu_state))
> + DO_ONCE(prandom_seed_full_state_all, (pcpu_state))
>
> /*
> * Handle minimum values for seeds
> diff --git a/lib/random32.c b/lib/random32.c
> index 24e7acd9343f..50d8f5f9fca7 100644
> --- a/lib/random32.c
> +++ b/lib/random32.c
> @@ -107,24 +107,28 @@ static void prandom_warmup(struct rnd_state *state)
> prandom_u32_state(state);
> }
>
> -void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state)
> +void prandom_seed_full_state_one(struct rnd_state *state)
> {
> - int i;
> + u32 seeds[4];
>
> - for_each_possible_cpu(i) {
> - struct rnd_state *state = per_cpu_ptr(pcpu_state, i);
> - u32 seeds[4];
> + get_random_bytes(&seeds, sizeof(seeds));
> + state->s1 = __seed(seeds[0], 2U);
> + state->s2 = __seed(seeds[1], 8U);
> + state->s3 = __seed(seeds[2], 16U);
> + state->s4 = __seed(seeds[3], 128U);
>
> - get_random_bytes(&seeds, sizeof(seeds));
> - state->s1 = __seed(seeds[0], 2U);
> - state->s2 = __seed(seeds[1], 8U);
> - state->s3 = __seed(seeds[2], 16U);
> - state->s4 = __seed(seeds[3], 128U);
> + prandom_warmup(state);
> +}
> +EXPORT_SYMBOL(prandom_seed_full_state_one);
>
> - prandom_warmup(state);
> - }
> +void prandom_seed_full_state_all(struct rnd_state __percpu *pcpu_state)
> +{
> + int i;
> +
> + for_each_possible_cpu(i)
> + prandom_seed_full_state_one(per_cpu_ptr(pcpu_state, i));
> }
> -EXPORT_SYMBOL(prandom_seed_full_state);
> +EXPORT_SYMBOL(prandom_seed_full_state_all);
>
> #ifdef CONFIG_RANDOM32_SELFTEST
> static struct prandom_test1 {
> ---8<---
>
Powered by blists - more mailing lists