lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0610d1be-15b4-40a6-8bec-307e62f810bb@efficios.com>
Date: Tue, 26 Aug 2025 11:12:48 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Thomas Gleixner <tglx@...utronix.de>, LKML <linux-kernel@...r.kernel.org>
Cc: Jens Axboe <axboe@...nel.dk>, Peter Zijlstra <peterz@...radead.org>,
 "Paul E. McKenney" <paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
 Paolo Bonzini <pbonzini@...hat.com>, Sean Christopherson
 <seanjc@...gle.com>, Wei Liu <wei.liu@...nel.org>,
 Dexuan Cui <decui@...rosoft.com>, x86@...nel.org,
 Arnd Bergmann <arnd@...db.de>, Heiko Carstens <hca@...ux.ibm.com>,
 Christian Borntraeger <borntraeger@...ux.ibm.com>,
 Sven Schnelle <svens@...ux.ibm.com>, Huacai Chen <chenhuacai@...nel.org>,
 Paul Walmsley <paul.walmsley@...ive.com>, Palmer Dabbelt <palmer@...belt.com>
Subject: Re: [patch V2 25/37] rseq: Rework the TIF_NOTIFY handler

On 2025-08-23 12:40, Thomas Gleixner wrote:
> Replace the whole logic with the new implementation, which is shared with
> signal delivery and the upcoming exit fast path.
> 
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> ---
>   kernel/rseq.c |   78 +++++++++++++++++++++++++---------------------------------
>   1 file changed, 34 insertions(+), 44 deletions(-)
> 
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -82,12 +82,6 @@
>   #define CREATE_TRACE_POINTS
>   #include <trace/events/rseq.h>
>   
> -#ifdef CONFIG_MEMBARRIER
> -# define RSEQ_EVENT_GUARD	irq
> -#else
> -# define RSEQ_EVENT_GUARD	preempt
> -#endif
> -
>   DEFINE_STATIC_KEY_MAYBE(CONFIG_RSEQ_DEBUG_DEFAULT_ENABLE, rseq_debug_enabled);
>   
>   static inline void rseq_control_debug(bool on)
> @@ -236,38 +230,15 @@ static bool rseq_handle_cs(struct task_s
>   	return rseq_update_user_cs(t, regs, csaddr);
>   }
>   
> -/*
> - * This resume handler must always be executed between any of:
> - * - preemption,
> - * - signal delivery,
> - * and return to user-space.
> - *
> - * This is how we can ensure that the entire rseq critical section
> - * will issue the commit instruction only if executed atomically with
> - * respect to other threads scheduled on the same CPU, and with respect
> - * to signal handlers.
> - */
> -void __rseq_handle_notify_resume(struct pt_regs *regs)
> +static void rseq_slowpath_update_usr(struct pt_regs *regs)
>   {
> +	/* Preserve rseq state and user_irq state for exit to user */
> +	const struct rseq_event evt_mask = { .has_rseq = true, .user_irq = true, };
>   	struct task_struct *t = current;
>   	struct rseq_ids ids;
>   	u32 node_id;
>   	bool event;
>   
> -	/*
> -	 * If invoked from hypervisors before entering the guest via
> -	 * resume_user_mode_work(), then @regs is a NULL pointer.
> -	 *
> -	 * resume_user_mode_work() clears TIF_NOTIFY_RESUME and re-raises
> -	 * it before returning from the ioctl() to user space when
> -	 * rseq_event.sched_switch is set.
> -	 *
> -	 * So it's safe to ignore here instead of pointlessly updating it
> -	 * in the vcpu_run() loop.
> -	 */
> -	if (!regs)
> -		return;
> -
>   	if (unlikely(t->flags & PF_EXITING))
>   		return;
>   
> @@ -291,26 +262,45 @@ void __rseq_handle_notify_resume(struct
>   	 * with the result handed in to allow the detection of
>   	 * inconsistencies.
>   	 */
> -	scoped_guard(RSEQ_EVENT_GUARD) {
> -		event = t->rseq_event.sched_switch;
> -		t->rseq_event.sched_switch = false;
> +	scoped_guard(irq) {
>   		ids.cpu_id = task_cpu(t);
>   		ids.mm_cid = task_mm_cid(t);
> +		event = t->rseq_event.sched_switch;
> +		t->rseq_event.all &= evt_mask.all;
>   	}
>   
> -	if (!IS_ENABLED(CONFIG_DEBUG_RSEQ) && !event)
> +	if (!event)
>   		return;
>   
> -	if (!rseq_handle_cs(t, regs))
> -		goto error;
> -
>   	node_id = cpu_to_node(ids.cpu_id);
> -	if (!rseq_set_uids(t, &ids, node_id))
> -		goto error;
> -	return;
>   
> -error:
> -	force_sig(SIGSEGV);
> +	if (unlikely(!rseq_update_usr(t, regs, &ids, node_id))) {
> +		/*
> +		 * Clear the errors just in case this might survive magically, but
> +		 * leave the rest intact.
> +		 */
> +		t->rseq_event.error = 0;
> +		force_sig(SIGSEGV);
> +	}
> +}
> +
> +void __rseq_handle_notify_resume(struct pt_regs *regs)
> +{
> +	/*
> +	 * If invoked from hypervisors before entering the guest via
> +	 * resume_user_mode_work(), then @regs is a NULL pointer.
> +	 *
> +	 * resume_user_mode_work() clears TIF_NOTIFY_RESUME and re-raises
> +	 * it before returning from the ioctl() to user space when
> +	 * rseq_event.sched_switch is set.
> +	 *
> +	 * So it's safe to ignore here instead of pointlessly updating it
> +	 * in the vcpu_run() loop.

I don't think any virt user should expect the userspace fields to be
updated on the host process while running in guest mode, but it's good
to clarify that we intend to change this user-visible behavior within
this series, to spare any unwelcome surprise.

Other than that:

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>

Thanks,

Mathieu

> +	 */
> +	if (!regs)
> +		return;
> +
> +	rseq_slowpath_update_usr(regs);
>   }
>   
>   void __rseq_signal_deliver(int sig, struct pt_regs *regs)
> 


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ