lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250813162824.356621744@linutronix.de>
Date: Wed, 13 Aug 2025 18:29:34 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: LKML <linux-kernel@...r.kernel.org>
Cc: Michael Jeanson <mjeanson@...icios.com>,
 Wei Liu <wei.liu@...nel.org>,
 Jens Axboe <axboe@...nel.dk>,
 Peter Zijlstra <peterz@...radead.org>,
 Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 "Paul E. McKenney" <paulmck@...nel.org>,
 Boqun Feng <boqun.feng@...il.com>
Subject: [patch 09/11] entry: Provide exit_to_user_notify_resume()

The TIF_NOTIFY_RESUME handler of restartable sequences is invoked as all
other functionality unconditionally when TIF_NOTIFY_RESUME is set for
what ever reason.

The invocation is already conditional on the rseq_event_pending bit being
set, but there is further room for improvement.

The actual invocation cannot be avoided when the event bit is set, but the
actual heavy lifting of accessing user space can be avoided, when the exit
to user mode loop is from a syscall unless it's a debug kernel. There is no
way for the RSEQ code to distinguish that case.

That's trivial for all architectures which use the generic entry code, but
for all others it's non-trivial work, which is beyond the scope of
this. The architectures, which want to benefit should convert their code
over to the generic entry code finally.

To prepare for that optimization rename resume_user_mode_work() to
exit_to_user_notify_resume() and add a @from_irq argument to it, which can
be supplied by the caller.

Let the generic entry code and all non-entry code users like hypervisors
and IO-URING use this new function and supply the correct information.

Any NOTIFY_RESUME work, which evaluates this new argument, has to make the
evaluation dependent on CONFIG_GENERIC_ENTRY because otherwise there is no
guarantee that the value is correct at all.

Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
Cc: Wei Liu <wei.liu@...nel.org>
Cc: Jens Axboe <axboe@...nel.dk>
Cc: Peter Zijlstra <peterz@...radead.org>
---
 drivers/hv/mshv_common.c         |    2 +-
 include/linux/resume_user_mode.h |   38 +++++++++++++++++++++++++++-----------
 io_uring/io_uring.h              |    2 +-
 kernel/entry/common.c            |    2 +-
 kernel/entry/kvm.c               |    2 +-
 5 files changed, 31 insertions(+), 15 deletions(-)

--- a/drivers/hv/mshv_common.c
+++ b/drivers/hv/mshv_common.c
@@ -155,7 +155,7 @@ int mshv_do_pre_guest_mode_work(ulong th
 		schedule();
 
 	if (th_flags & _TIF_NOTIFY_RESUME)
-		resume_user_mode_work(NULL);
+		exit_to_user_notify_resume(NULL, false);
 
 	return 0;
 }
--- a/include/linux/resume_user_mode.h
+++ b/include/linux/resume_user_mode.h
@@ -24,21 +24,22 @@ static inline void set_notify_resume(str
 		kick_process(task);
 }
 
-
 /**
- * resume_user_mode_work - Perform work before returning to user mode
- * @regs:		user-mode registers of @current task
+ * exit_to_user_notify_resume - Perform work before returning to user mode
+ * @regs:	user-mode registers of @current task
+ * @from_irq:	If true this is a return from interrupt, if false it's
+ *		a syscall return.
  *
- * This is called when %TIF_NOTIFY_RESUME has been set.  Now we are
- * about to return to user mode, and the user state in @regs can be
- * inspected or adjusted.  The caller in arch code has cleared
- * %TIF_NOTIFY_RESUME before the call.  If the flag gets set again
- * asynchronously, this will be called again before we return to
- * user mode.
+ * This is called when %TIF_NOTIFY_RESUME has been set to handle the exit
+ * to user work, which is multiplexed under this TIF bit. The bit is
+ * cleared and work is probed as pending. If the flag gets set again before
+ * exiting to user space caller will invoke this again.
  *
- * Called without locks.
+ * Any work invoked here, which wants to make decisions on @from_irq, must
+ * make these decisions dependent on CONFIG_GENERIC_ENTRY to retain the
+ * historical behaviour of resume_user_mode_work().
  */
-static inline void resume_user_mode_work(struct pt_regs *regs)
+static inline void exit_to_user_notify_resume(struct pt_regs *regs, bool from_irq)
 {
 	clear_thread_flag(TIF_NOTIFY_RESUME);
 	/*
@@ -62,4 +63,19 @@ static inline void resume_user_mode_work
 	rseq_handle_notify_resume(regs);
 }
 
+#ifndef CONFIG_GENERIC_ENTRY
+/**
+ * resume_user_mode_work - Perform work before returning to user mode
+ * @regs:		user-mode registers of @current task
+ *
+ * This is a wrapper around exit_to_user_notify_resume() for the existing
+ * call sites in architecture code, which do not use the generic entry
+ * code.
+ */
+static inline void resume_user_mode_work(struct pt_regs *regs)
+{
+	exit_to_user_notify_resume(regs, false);
+}
+#endif
+
 #endif /* LINUX_RESUME_USER_MODE_H */
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -365,7 +365,7 @@ static inline int io_run_task_work(void)
 	if (current->flags & PF_IO_WORKER) {
 		if (test_thread_flag(TIF_NOTIFY_RESUME)) {
 			__set_current_state(TASK_RUNNING);
-			resume_user_mode_work(NULL);
+			exit_to_user_notify_resume(NULL, false);
 		}
 		if (current->io_uring) {
 			unsigned int count = 0;
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -41,7 +41,7 @@ void __weak arch_do_signal_or_restart(st
 			arch_do_signal_or_restart(regs);
 
 		if (ti_work & _TIF_NOTIFY_RESUME)
-			resume_user_mode_work(regs);
+			exit_to_user_notify_resume(regs, from_irq);
 
 		/* Architecture specific TIF work */
 		arch_exit_to_user_mode_work(regs, ti_work);
--- a/kernel/entry/kvm.c
+++ b/kernel/entry/kvm.c
@@ -17,7 +17,7 @@ static int xfer_to_guest_mode_work(struc
 			schedule();
 
 		if (ti_work & _TIF_NOTIFY_RESUME)
-			resume_user_mode_work(NULL);
+			exit_to_user_notify_resume(NULL, false);
 
 		ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);
 		if (ret)


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ