lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f68741e0-0cc8-4faa-8144-e1786b9591f1@efficios.com>
Date: Mon, 17 Feb 2025 15:21:31 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Dmitry Vyukov <dvyukov@...gle.com>, peterz@...radead.org,
 boqun.feng@...il.com, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
 dave.hansen@...ux.intel.com, hpa@...or.com, aruna.ramakrishna@...cle.com,
 elver@...gle.com
Cc: "Paul E. McKenney" <paulmck@...nel.org>, x86@...nel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/4] rseq: Make rseq work with protection keys

On 2025-02-17 06:07, Dmitry Vyukov wrote:
> If an application registers rseq, and ever switches to another pkey
> protection (such that the rseq becomes inaccessible), then any
> context switch will cause failure in __rseq_handle_notify_resume()
> attempting to read/write struct rseq and/or rseq_cs. Since context
> switches are asynchronous and are outside of the application control
> (not part of the restricted code scope), temporarily switch to
> premissive pkey register to read/write rseq/rseq_cs, similarly

permissive

> to signal delivery accesses to altstack.
> 
> Signed-off-by: Dmitry Vyukov <dvyukov@...gle.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: "Paul E. McKenney" <paulmck@...nel.org>
> Cc: Boqun Feng <boqun.feng@...il.com>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: Borislav Petkov <bp@...en8.de>
> Cc: Dave Hansen <dave.hansen@...ux.intel.com>
> Cc: "H. Peter Anvin" <hpa@...or.com>
> Cc: Aruna Ramakrishna <aruna.ramakrishna@...cle.com>
> Cc: x86@...nel.org
> Cc: linux-kernel@...r.kernel.org
> ---
>   kernel/rseq.c | 36 ++++++++++++++++++++++++++++++++++++
>   1 file changed, 36 insertions(+)
> 
> diff --git a/kernel/rseq.c b/kernel/rseq.c
> index 442aba29bc4cf..31cd94b370ef3 100644
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -10,6 +10,7 @@
>   
>   #include <linux/sched.h>
>   #include <linux/uaccess.h>
> +#include <linux/pkeys.h>
>   #include <linux/syscalls.h>
>   #include <linux/rseq.h>
>   #include <linux/types.h>
> @@ -403,10 +404,13 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
>   {
>   	struct task_struct *t = current;
>   	int ret, sig;
> +	pkey_reg_t saved;
> +	bool switched_pkey_reg = false;
>   
>   	if (unlikely(t->flags & PF_EXITING))
>   		return;
>   
> +retry:
>   	/*
>   	 * regs is NULL if and only if the caller is in a syscall path.  Skip
>   	 * fixup and leave rseq_cs as is so that rseq_sycall() will detect and
> @@ -419,9 +423,41 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
>   	}
>   	if (unlikely(rseq_update_cpu_node_id(t)))
>   		goto error;
> +	if (switched_pkey_reg)
> +		write_pkey_reg(saved);
>   	return;
>   
>   error:
> +	/*
> +	 * If the application registers rseq, and ever switches to another
> +	 * pkey protection (such that the rseq becomes inaccessible), then
> +	 * any context switch will cause failure here attempting to read/write
> +	 * struct rseq and/or rseq_cs. Since context switches are
> +	 * asynchronous and are outside of the application control
> +	 * (not part of the restricted code scope), we temporarily switch

Remove "we".

> +	 * to premissive pkey register to read/write rseq/rseq_cs,

permissive

> +	 * similarly to signal delivery accesses to altstack.
> +	 *
> +	 * We don't bother to check if the failure really happened due to

Remove "We".

> +	 * pkeys or not, since it does not matter (performance-wise and
> +	 * otherwise).
> +	 *
> +	 * If the restricted code installs rseq_cs in inaccessible to it
> +	 * due to pkeys memory,

This sentence should be reworded.

  we still let this function read the rseq_cs.
> +	 * It's unclear what benefits the resticted code gets by doing this

restricted

> +	 * (it probably already hijacked control flow at this point), and
> +	 * presumably any sane sandbox should prohibit restricted code
> +	 * from accessing struct rseq, and this is still better than
> +	 * terminating the app unconditionally (it always has a choice
> +	 * of not using rseq and pkeys together).

Note that because userspace can complete an rseq critical section
without clearing the rseq_cs pointer, this could happen simply because
the kernel is preempting the task after it has:

1) completed an rseq critical section, without clearing rseq_cs,
2) changed pkey.

So allowing this is important, and I would remove the comment about
hijacked control flow and such. This can happen with normal use of the
ABI.

Thanks,

Mathieu


> +	 */
> +	if (!switched_pkey_reg) {
> +		switched_pkey_reg = true;
> +		saved = switch_to_permissive_pkey_reg();
> +		goto retry;
> +	} else {
> +		write_pkey_reg(saved);
> +	}
>   	sig = ksig ? ksig->sig : 0;
>   	force_sigsegv(sig);
>   }


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ