lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMj1kXHX1W2Kz5NJehtzDhOCKWEd6y1sp8+ViOMHGsq9e8=7Pw@mail.gmail.com>
Date: Tue, 2 Dec 2025 10:15:22 +0100
From: Ard Biesheuvel <ardb@...nel.org>
To: Ryan Roberts <ryan.roberts@....com>
Cc: Kees Cook <kees@...nel.org>, Ard Biesheuvel <ardb+git@...gle.com>, Will Deacon <will@...nel.org>, 
	Arnd Bergmann <arnd@...db.de>, Jeremy Linton <jeremy.linton@....com>, 
	Catalin Marinas <Catalin.Marinas@....com>, Mark Rutland <mark.rutland@....com>, 
	"Jason A . Donenfeld" <Jason@...c4.com>, linux-hardening@...r.kernel.org, 
	linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v1 2/2] randomize_kstack: Unify random source across arches

On Mon, 1 Dec 2025 at 19:20, Ryan Roberts <ryan.roberts@....com> wrote:
>
> On 28/11/2025 11:01, Ard Biesheuvel wrote:
> > On Thu, 27 Nov 2025 at 12:00, Ryan Roberts <ryan.roberts@....com> wrote:
> >>
> >> Previously different architectures were using random sources of
> >> differing strength and cost to decide the random kstack offset. A number
> >> of architectures (loongarch, powerpc, s390, x86) were using their
> >> timestamp counter, at whatever the frequency happened to be. Other
> >> arches (arm64, riscv) were using entropy from the crng via
> >> get_random_u16().
> >>
> >> There have been concerns that in some cases the timestamp counters may
> >> be too weak, because they can be easily guessed or influenced by user
> >> space. And get_random_u16() has been shown to be too costly for the
> >> level of protection kstack offset randomization provides.
> >>
> >> So let's use a common, architecture-agnostic source of entropy; a
> >> per-task prng, seeded at fork-time from the crng. This has a few
> >> benefits:
> >>
> >>   - We can remove choose_random_kstack_offset(); That was only there to
> >>     try to make the timestamp counter value a bit harder to influence
> >>     from user space.
> >>
> >>   - The architecture code is simplified. All it has to do now is call
> >>     add_random_kstack_offset() in the syscall path.
> >>
> >>   - The strength of the randomness can be reasoned about independently
> >>     of the architecture.
> >>
> >>   - Arches previously using get_random_u16() now have much faster
> >>     syscall paths, see below results.
> >>
> >> There have been some claims that a prng may be less strong than the
> >> timestamp counter if not regularly reseeded. But the prng has a period
> >> of about 2^113. So as long as the prng state remains secret, it should
> >> not be possible to guess. If the prng state can be accessed, we have
> >> bigger problems.
> >>
> >> Additionally, we are only consuming 6 bits to randomize the stack, so
> >> there are only 64 possible random offsets. I assert that it would be
> >> trivial for an attacker to brute force by repeating their attack and
> >> waiting for the random stack offset to be the desired one. The prng
> >> approach seems entirely proportional to this level of protection.
> >>
> >> Performance data are provided below. The baseline is v6.18-rc5 with
> >> rndstack on for each respective arch. (I)/(R) indicate statistically
> >> significant improvement/regression. arm64 platform is AWS Graviton3.
> >> x86_64 platform is AWS Sapphire Rapids:
> >>
> >> +-----------------+--------------+---------------+---------------+
> >> | Benchmark       | Result Class | per-task-prng | per-task-prng |
> >> |                 |              |         arm64 |        x86_64 |
> >> +=================+==============+===============+===============+
> >> | syscall/getpid  | mean (ns)    |   (I) -10.54% |    (I) -7.69% |
> >> |                 | p99 (ns)     |   (I) -59.53% |         4.14% |
> >> |                 | p99.9 (ns)   |   (I) -59.90% |         2.68% |
> >> +-----------------+--------------+---------------+---------------+
> >> | syscall/getppid | mean (ns)    |   (I) -10.49% |    (I) -5.98% |
> >> |                 | p99 (ns)     |   (I) -59.83% |        -3.11% |
> >> |                 | p99.9 (ns)   |   (I) -59.88% |     (R) 9.84% |
> >> +-----------------+--------------+---------------+---------------+
> >> | syscall/invalid | mean (ns)    |    (I) -9.28% |    (I) -6.94% |
> >> |                 | p99 (ns)     |   (I) -61.06% |    (I) -5.57% |
> >> |                 | p99.9 (ns)   |   (I) -61.40% |    (R) 10.53% |
> >> +-----------------+--------------+---------------+---------------+
> >>
> >> Signed-off-by: Ryan Roberts <ryan.roberts@....com>
> >> ---
> >>  arch/Kconfig                         |  5 ++--
> >>  arch/arm64/kernel/syscall.c          | 11 -------
> >>  arch/loongarch/kernel/syscall.c      | 11 -------
> >>  arch/powerpc/kernel/syscall.c        | 12 --------
> >>  arch/riscv/kernel/traps.c            | 12 --------
> >>  arch/s390/include/asm/entry-common.h |  8 ------
> >>  arch/x86/include/asm/entry-common.h  | 12 --------
> >>  include/linux/prandom.h              |  7 ++++-
> >>  include/linux/randomize_kstack.h     | 43 ++++++++--------------------
> >>  include/linux/sched.h                |  3 +-
> >>  10 files changed, 22 insertions(+), 102 deletions(-)
> >>
> >
> > I think this is a substantial improvement over the status quo,
> > especially with all the dodgy uses of monotonic counters with
> > unspecified granularity.
> >
> > So with two comments below:
> >
> > Reviewed-by: Ard Biesheuvel <ardb@...nel.org>
>
> Thanks!
>
> I shall plan to re-post without the RFC tag against -rc1 then, and we will see
> if anyone pushes back.
>

Sounds good.

...
> >> diff --git a/include/linux/prandom.h b/include/linux/prandom.h
> >> index f2ed5b72b3d6..827edde11cb9 100644
> >> --- a/include/linux/prandom.h
> >> +++ b/include/linux/prandom.h
> >> @@ -10,13 +10,18 @@
> >>
> >>  #include <linux/types.h>
> >>  #include <linux/once.h>
> >> -#include <linux/percpu.h>
> >>  #include <linux/random.h>
> >>
> >>  struct rnd_state {
> >>         __u32 s1, s2, s3, s4;
> >>  };
> >>
> >> +/*
> >> + * percpu.h includes sched.h, which requires struct rnd_state. So defer until
> >> + * after struct rnd_state is defined.
> >> + */
> >> +#include <linux/percpu.h>
> >> +
> >
> > Yuck. Is this the best we can do to disentangle this?
>
> I think I might be able to remove some stuff out of sched.h into alloc_tag.h
> (looks to me like it should have been there all along). And that will allow
> removing the sched.h include from percpu.h. I think that will solve it.
>

OK.

> >> diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
> >> index 089b1432f7e6..83c7e6710f6d 100644
> >> --- a/include/linux/randomize_kstack.h
> >> +++ b/include/linux/randomize_kstack.h
> >> @@ -6,6 +6,7 @@
> >>  #include <linux/kernel.h>
> >>  #include <linux/jump_label.h>
> >>  #include <linux/percpu-defs.h>
> >> +#include <linux/prandom.h>
> >>
> >>  DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> >>                          randomize_kstack_offset);
> >> @@ -45,9 +46,13 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> >>  #define KSTACK_OFFSET_MAX(x)   ((x) & 0b1111111100)
> >>  #endif
> >>
> >> +static __always_inline u32 get_update_kstack_offset(void)
> >> +{
> >> +       return prandom_u32_state(&current->kstack_rnd_state);
>
> I've got bot warnings because this is being called from noinstr code. I guess
> the best option is to just move add_random_kstack_offset() to after
> instrumentation is enabled for the affected arches.
>

Just put instrumentation_begin()/instrumentation_end() around the call
to prandom_u32_state() - that seems to be the common approach for
punching holes into the 'noinstr' validation.


> >> +}
> >> +
> >>  /**
> >> - * add_random_kstack_offset - Increase stack utilization by previously
> >> - *                           chosen random offset
> >> + * add_random_kstack_offset - Increase stack utilization by a random offset.
> >>   *
> >>   * This should be used in the syscall entry path after user registers have been
> >>   * stored to the stack. Preemption may be enabled. For testing the resulting
> >> @@ -56,46 +61,22 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> >>  #define add_random_kstack_offset() do {                                        \
> >>         if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
> >>                                 &randomize_kstack_offset)) {            \
> >> -               u32 offset = current->kstack_offset;                    \
> >> +               u32 offset = get_update_kstack_offset();                \
> >>                 u8 *ptr = __kstack_alloca(KSTACK_OFFSET_MAX(offset));   \
> >>                 /* Keep allocation even after "ptr" loses scope. */     \
> >>                 asm volatile("" :: "r"(ptr) : "memory");                \
> >>         }                                                               \
> >>  } while (0)
> >>
> >> -/**
> >> - * choose_random_kstack_offset - Choose the random offset for the next
> >> - *                              add_random_kstack_offset()
> >> - *
> >> - * This should only be used during syscall exit. Preemption may be enabled. This
> >> - * position in the syscall flow is done to frustrate attacks from userspace
> >> - * attempting to learn the next offset:
> >> - * - Maximize the timing uncertainty visible from userspace: if the
> >> - *   offset is chosen at syscall entry, userspace has much more control
> >> - *   over the timing between choosing offsets. "How long will we be in
> >> - *   kernel mode?" tends to be more difficult to predict than "how long
> >> - *   will we be in user mode?"
> >> - * - Reduce the lifetime of the new offset sitting in memory during
> >> - *   kernel mode execution. Exposure of "thread-local" memory content
> >> - *   (e.g. current, percpu, etc) tends to be easier than arbitrary
> >> - *   location memory exposure.
> >> - */
> >> -#define choose_random_kstack_offset(rand) do {                         \
> >> -       if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
> >> -                               &randomize_kstack_offset)) {            \
> >> -               u32 offset = current->kstack_offset;                    \
> >> -               offset = ror32(offset, 5) ^ (rand);                     \
> >> -               current->kstack_offset = offset;                        \
> >> -       }                                                               \
> >> -} while (0)
> >> -
> >>  static inline void random_kstack_task_init(struct task_struct *tsk)
> >>  {
> >> -       current->kstack_offset = 0;
> >> +       if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
> >> +                               &randomize_kstack_offset)) {
> >> +               prandom_seed_state(&tsk->kstack_rnd_state, get_random_u64());
> >
> > We should either fix prandom_seed_state() not to truncate the u64 to
> > u32, or even better, refactor prandom_seed_full_state() so we can
> > reuse it here, and use a 128-bit seed directly.
>
> How about something like this:
>

Looks good to me, but it does make we wonder why a full prandom state
(i.e., one per cpu) isn't sufficient here, and why we are putting one
in every task. Is it just to avoid the overhead of dis/enabling
preemption on every syscall entry?





>
> ---8<---
> diff --git a/include/linux/prandom.h b/include/linux/prandom.h
> index f2ed5b72b3d6..9b651c9b3448 100644
> --- a/include/linux/prandom.h
> +++ b/include/linux/prandom.h
> @@ -19,10 +19,11 @@ struct rnd_state {
>
>  u32 prandom_u32_state(struct rnd_state *state);
>  void prandom_bytes_state(struct rnd_state *state, void *buf, size_t nbytes);
> -void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
> +void prandom_seed_full_state_one(struct rnd_state *state);
> +void prandom_seed_full_state_all(struct rnd_state __percpu *pcpu_state);
>
>  #define prandom_init_once(pcpu_state)                  \
> -       DO_ONCE(prandom_seed_full_state, (pcpu_state))
> +       DO_ONCE(prandom_seed_full_state_all, (pcpu_state))
>
>  /*
>   * Handle minimum values for seeds
> diff --git a/lib/random32.c b/lib/random32.c
> index 24e7acd9343f..50d8f5f9fca7 100644
> --- a/lib/random32.c
> +++ b/lib/random32.c
> @@ -107,24 +107,28 @@ static void prandom_warmup(struct rnd_state *state)
>         prandom_u32_state(state);
>  }
>
> -void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state)
> +void prandom_seed_full_state_one(struct rnd_state *state)
>  {
> -       int i;
> +       u32 seeds[4];
>
> -       for_each_possible_cpu(i) {
> -               struct rnd_state *state = per_cpu_ptr(pcpu_state, i);
> -               u32 seeds[4];
> +       get_random_bytes(&seeds, sizeof(seeds));
> +       state->s1 = __seed(seeds[0],   2U);
> +       state->s2 = __seed(seeds[1],   8U);
> +       state->s3 = __seed(seeds[2],  16U);
> +       state->s4 = __seed(seeds[3], 128U);
>
> -               get_random_bytes(&seeds, sizeof(seeds));
> -               state->s1 = __seed(seeds[0],   2U);
> -               state->s2 = __seed(seeds[1],   8U);
> -               state->s3 = __seed(seeds[2],  16U);
> -               state->s4 = __seed(seeds[3], 128U);
> +       prandom_warmup(state);
> +}
> +EXPORT_SYMBOL(prandom_seed_full_state_one);
>
> -               prandom_warmup(state);
> -       }
> +void prandom_seed_full_state_all(struct rnd_state __percpu *pcpu_state)
> +{
> +       int i;
> +
> +       for_each_possible_cpu(i)
> +               prandom_seed_full_state_one(per_cpu_ptr(pcpu_state, i));
>  }
> -EXPORT_SYMBOL(prandom_seed_full_state);
> +EXPORT_SYMBOL(prandom_seed_full_state_all);
>
>  #ifdef CONFIG_RANDOM32_SELFTEST
>  static struct prandom_test1 {
> ---8<---
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ