lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <0d0e0a0a7136d49af9a8d6a849e1aa4bf086c472.1739790300.git.dvyukov@google.com>
Date: Mon, 17 Feb 2025 12:07:32 +0100
From: Dmitry Vyukov <dvyukov@...gle.com>
To: mathieu.desnoyers@...icios.com, peterz@...radead.org, boqun.feng@...il.com, 
	tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, 
	dave.hansen@...ux.intel.com, hpa@...or.com, aruna.ramakrishna@...cle.com, 
	elver@...gle.com
Cc: Dmitry Vyukov <dvyukov@...gle.com>, "Paul E. McKenney" <paulmck@...nel.org>, x86@...nel.org, 
	linux-kernel@...r.kernel.org
Subject: [PATCH 3/4] rseq: Make rseq work with protection keys

If an application registers rseq, and ever switches to another pkey
protection (such that the rseq becomes inaccessible), then any
context switch will cause failure in __rseq_handle_notify_resume()
attempting to read/write struct rseq and/or rseq_cs. Since context
switches are asynchronous and are outside of the application control
(not part of the restricted code scope), temporarily switch to
premissive pkey register to read/write rseq/rseq_cs, similarly
to signal delivery accesses to altstack.

Signed-off-by: Dmitry Vyukov <dvyukov@...gle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Boqun Feng <boqun.feng@...il.com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Borislav Petkov <bp@...en8.de>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>
Cc: "H. Peter Anvin" <hpa@...or.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna@...cle.com>
Cc: x86@...nel.org
Cc: linux-kernel@...r.kernel.org
---
 kernel/rseq.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/kernel/rseq.c b/kernel/rseq.c
index 442aba29bc4cf..31cd94b370ef3 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -10,6 +10,7 @@
 
 #include <linux/sched.h>
 #include <linux/uaccess.h>
+#include <linux/pkeys.h>
 #include <linux/syscalls.h>
 #include <linux/rseq.h>
 #include <linux/types.h>
@@ -403,10 +404,13 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
 {
 	struct task_struct *t = current;
 	int ret, sig;
+	pkey_reg_t saved;
+	bool switched_pkey_reg = false;
 
 	if (unlikely(t->flags & PF_EXITING))
 		return;
 
+retry:
 	/*
 	 * regs is NULL if and only if the caller is in a syscall path.  Skip
 	 * fixup and leave rseq_cs as is so that rseq_sycall() will detect and
@@ -419,9 +423,41 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
 	}
 	if (unlikely(rseq_update_cpu_node_id(t)))
 		goto error;
+	if (switched_pkey_reg)
+		write_pkey_reg(saved);
 	return;
 
 error:
+	/*
+	 * If the application registers rseq, and ever switches to another
+	 * pkey protection (such that the rseq becomes inaccessible), then
+	 * any context switch will cause failure here attempting to read/write
+	 * struct rseq and/or rseq_cs. Since context switches are
+	 * asynchronous and are outside of the application control
+	 * (not part of the restricted code scope), we temporarily switch
+	 * to premissive pkey register to read/write rseq/rseq_cs,
+	 * similarly to signal delivery accesses to altstack.
+	 *
+	 * We don't bother to check if the failure really happened due to
+	 * pkeys or not, since it does not matter (performance-wise and
+	 * otherwise).
+	 *
+	 * If the restricted code installs rseq_cs in inaccessible to it
+	 * due to pkeys memory, we still let this function read the rseq_cs.
+	 * It's unclear what benefits the resticted code gets by doing this
+	 * (it probably already hijacked control flow at this point), and
+	 * presumably any sane sandbox should prohibit restricted code
+	 * from accessing struct rseq, and this is still better than
+	 * terminating the app unconditionally (it always has a choice
+	 * of not using rseq and pkeys together).
+	 */
+	if (!switched_pkey_reg) {
+		switched_pkey_reg = true;
+		saved = switch_to_permissive_pkey_reg();
+		goto retry;
+	} else {
+		write_pkey_reg(saved);
+	}
 	sig = ksig ? ksig->sig : 0;
 	force_sigsegv(sig);
 }
-- 
2.48.1.601.g30ceb7b040-goog


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ