lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240221020258.1210148-1-jeremy.linton@arm.com>
Date: Tue, 20 Feb 2024 20:02:58 -0600
From: Jeremy Linton <jeremy.linton@....com>
To: linux-arm-kernel@...ts.infradead.org
Cc: catalin.marinas@....com,
	will@...nel.org,
	keescook@...omium.org,
	gustavoars@...nel.org,
	mark.rutland@....com,
	rostedt@...dmis.org,
	arnd@...db.de,
	broonie@...nel.org,
	guohui@...ontech.com,
	Manoj.Iyer@....com,
	linux-kernel@...r.kernel.org,
	linux-hardening@...r.kernel.org,
	Jeremy Linton <jeremy.linton@....com>,
	James Yang <james.yang@....com>,
	Shiyou Huang <shiyou.huang@....com>
Subject: [RFC] arm64: syscall: Direct PRNG kstack randomization

The existing arm64 stack randomization uses the kernel rng to acquire
5 bits of address space randomization. This is problematic because it
creates non determinism in the syscall path when the rng needs to be
generated or reseeded. This shows up as large tail latencies in some
benchmarks and directly affects the minimum RT latencies as seen by
cyclictest.

Other architectures are using timers/cycle counters for this function,
which is sketchy from a randomization perspective because it should be
possible to estimate this value from knowledge of the syscall return
time, and from reading the current value of the timer/counters.

So, a poor rng should be better than the cycle counter if it is hard
to extract the stack offsets sufficiently to be able to detect the
PRNG's period.

So, we can potentially choose a 'better' or larger PRNG, going as far
as using one of the CSPRNGs already in the kernel, but the overhead
increases appropriately. Further, there are a few options for
reseeding, possibly out of the syscall path, but is it even useful in
this case?

Reported-by: James Yang <james.yang@....com>
Reported-by: Shiyou Huang <shiyou.huang@....com>
Signed-off-by: Jeremy Linton <jeremy.linton@....com>
---
 arch/arm64/kernel/syscall.c | 55 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 54 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index 9a70d9746b66..70143cb8c7be 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -37,6 +37,59 @@ static long __invoke_syscall(struct pt_regs *regs, syscall_fn_t syscall_fn)
 	return syscall_fn(regs);
 }
 
+#ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
+DEFINE_PER_CPU(u32, kstackrng);
+static u32 xorshift32(u32 state)
+{
+	/*
+	 * From top of page 4 of Marsaglia, "Xorshift RNGs"
+	 * This algorithm is intended to have a period 2^32 -1
+	 * And should not be used anywhere else outside of this
+	 * code path.
+	 */
+	state ^= state << 13;
+	state ^= state >> 17;
+	state ^= state << 5;
+	return state;
+}
+
+static u16 kstack_rng(void)
+{
+	u32 rng = raw_cpu_read(kstackrng);
+
+	rng = xorshift32(rng);
+	raw_cpu_write(kstackrng, rng);
+	return rng & 0x1ff;
+}
+
+/* Should we reseed? */
+static int kstack_rng_setup(unsigned int cpu)
+{
+	u32 rng_seed;
+
+	do {
+		rng_seed = get_random_u32();
+	} while (!rng_seed);
+	raw_cpu_write(kstackrng, rng_seed);
+	return 0;
+}
+
+static int kstack_init(void)
+{
+	int ret;
+
+	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "arm64/cpuinfo:kstackrandomize",
+				kstack_rng_setup, NULL);
+	if (ret < 0)
+		pr_err("kstack: failed to register rng callbacks.\n");
+	return 0;
+}
+
+arch_initcall(kstack_init);
+#else
+static u16 kstack_rng(void) { return 0; }
+#endif /* CONFIG_RANDOMIZE_KSTACK_OFFSET */
+
 static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
 			   unsigned int sc_nr,
 			   const syscall_fn_t syscall_table[])
@@ -66,7 +119,7 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
 	 *
 	 * The resulting 5 bits of entropy is seen in SP[8:4].
 	 */
-	choose_random_kstack_offset(get_random_u16() & 0x1FF);
+	choose_random_kstack_offset(kstack_rng());
 }
 
 static inline bool has_syscall_work(unsigned long flags)
-- 
2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ