[<prev] [next>] [day] [month] [year] [list]
Message-ID: <b9c17ad6f9372d3c92b9812b5ea3dbaf44e82b01.1739864467.git.dvyukov@google.com>
Date: Tue, 18 Feb 2025 08:43:47 +0100
From: Dmitry Vyukov <dvyukov@...gle.com>
To: mathieu.desnoyers@...icios.com, peterz@...radead.org, boqun.feng@...il.com,
tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, hpa@...or.com, aruna.ramakrishna@...cle.com,
elver@...gle.com
Cc: Dmitry Vyukov <dvyukov@...gle.com>, "Paul E. McKenney" <paulmck@...nel.org>, x86@...nel.org,
linux-kernel@...r.kernel.org
Subject: [PATCH v2 3/4] rseq: Make rseq work with protection keys
If an application registers rseq, and ever switches to another pkey
protection (such that the rseq becomes inaccessible), then any
context switch will cause failure in __rseq_handle_notify_resume()
attempting to read/write struct rseq and/or rseq_cs. Since context
switches are asynchronous and are outside of the application control
(not part of the restricted code scope), temporarily switch to
permissive pkey register to read/write rseq/rseq_cs, similarly
to signal delivery accesses to altstack.
Signed-off-by: Dmitry Vyukov <dvyukov@...gle.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Boqun Feng <boqun.feng@...il.com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Borislav Petkov <bp@...en8.de>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>
Cc: "H. Peter Anvin" <hpa@...or.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna@...cle.com>
Cc: x86@...nel.org
Cc: linux-kernel@...r.kernel.org
---
Changes in v2:
- fixed typos and reworded the comment
---
kernel/rseq.c | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/kernel/rseq.c b/kernel/rseq.c
index 442aba29bc4cf..6fc9f799720cd 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -10,6 +10,7 @@
#include <linux/sched.h>
#include <linux/uaccess.h>
+#include <linux/pkeys.h>
#include <linux/syscalls.h>
#include <linux/rseq.h>
#include <linux/types.h>
@@ -403,10 +404,13 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
{
struct task_struct *t = current;
int ret, sig;
+ pkey_reg_t saved;
+ bool switched_pkey_reg = false;
if (unlikely(t->flags & PF_EXITING))
return;
+retry:
/*
* regs is NULL if and only if the caller is in a syscall path. Skip
* fixup and leave rseq_cs as is so that rseq_sycall() will detect and
@@ -419,9 +423,43 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
}
if (unlikely(rseq_update_cpu_node_id(t)))
goto error;
+ if (switched_pkey_reg)
+ write_pkey_reg(saved);
return;
error:
+ /*
+ * If the application registers rseq, and ever switches to another
+ * pkey protection (such that the rseq becomes inaccessible), then
+ * any context switch will cause failure here attempting to read/write
+ * struct rseq and/or rseq_cs. Since context switches are
+ * asynchronous and are outside of the application control
+ * (not part of the restricted code scope), temporarily switch
+ * to permissive pkey register to read/write rseq/rseq_cs,
+ * similarly to signal delivery accesses to altstack.
+ *
+ * Don't bother to check if the failure really happened due to
+ * pkeys or not, since it does not matter (performance-wise and
+ * otherwise).
+ *
+ * Note that if code has write access to struct rseq, it may install
+ * rseq_cs that is not accessible to it due to pkeys. Still let this
+ * function read such rseq_cs on behalf of the code circumventing
+ * pkeys protection. It's unclear what benefits the restricted code
+ * gets by doing this (it presumably has already hijacked control
+ * flow at this point, or has arbitrary write primitive to write
+ * arbitrary values to struct rseq). A sane sandbox should prohibit
+ * restricted code from accessing struct rseq. Disabling pkeys
+ * protection is still better than terminating the app
+ * unconditionally.
+ */
+ if (!switched_pkey_reg) {
+ switched_pkey_reg = true;
+ saved = switch_to_permissive_pkey_reg();
+ goto retry;
+ } else {
+ write_pkey_reg(saved);
+ }
sig = ksig ? ksig->sig : 0;
force_sigsegv(sig);
}
--
2.48.1.601.g30ceb7b040-goog
Powered by blists - more mailing lists