lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0090fb14-e78f-4b67-8933-bf9ef89ba0d9@efficios.com>
Date: Mon, 25 Aug 2025 15:43:32 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Thomas Gleixner <tglx@...utronix.de>, LKML <linux-kernel@...r.kernel.org>
Cc: Jens Axboe <axboe@...nel.dk>, Peter Zijlstra <peterz@...radead.org>,
 "Paul E. McKenney" <paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
 Paolo Bonzini <pbonzini@...hat.com>, Sean Christopherson
 <seanjc@...gle.com>, Wei Liu <wei.liu@...nel.org>,
 Dexuan Cui <decui@...rosoft.com>, x86@...nel.org,
 Arnd Bergmann <arnd@...db.de>, Heiko Carstens <hca@...ux.ibm.com>,
 Christian Borntraeger <borntraeger@...ux.ibm.com>,
 Sven Schnelle <svens@...ux.ibm.com>, Huacai Chen <chenhuacai@...nel.org>,
 Paul Walmsley <paul.walmsley@...ive.com>, Palmer Dabbelt <palmer@...belt.com>
Subject: Re: [patch V2 37/37] entry/rseq: Optimize for TIF_RSEQ on exit

On 2025-08-23 12:40, Thomas Gleixner wrote:
> Further analysis of the exit path with the seperate TIF_RSEQ showed that
> depending on the workload a significant amount of invocations of
> resume_user_mode_work() ends up with no other bit set than TIF_RSEQ.
> 
> On architectures with a separate TIF_RSEQ this can be distinguished and
> checked right at the beginning of the function before entering the loop.
> 
> The quick check is lightweight so it does not impose a massive penalty on
> non-RSEQ use cases. It just checks for the work being empty, except for
> TIF_RSEQ and jumps right into the handling fast path.
> 
> This is truly the only TIF bit there which can be optimized that way
> because the handling runs only when all the other work has been done. The
> optimization spares a full round trip through the other conditionals and an
> interrupt enable/disable pair. The generated code looks reasonable enough
> to justify this and the resulting numbers do so as well.
> 
> The main beneficiaries are blocking syscall heavy work loads, where the
> tasks often end up being scheduled on a different CPU or get a different MM
> CID, but have no other work to handle on return.
> 
> A futex benchmark showed up to 90% shortcut utilization and a measurable
> improvement in perf of ~1%. Non-scheduling work loads do neither see an
> improvement nor degrade. A full kernel build shows about 15% shortcuts,
> but no measurable side effects in either direction.

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>

> 
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> ---
>   include/linux/rseq_entry.h |   14 ++++++++++++++
>   kernel/entry/common.c      |   13 +++++++++++--
>   kernel/rseq.c              |    2 ++
>   3 files changed, 27 insertions(+), 2 deletions(-)
> 
> --- a/include/linux/rseq_entry.h
> +++ b/include/linux/rseq_entry.h
> @@ -11,6 +11,7 @@ struct rseq_stats {
>   	unsigned long	signal;
>   	unsigned long	slowpath;
>   	unsigned long	fastpath;
> +	unsigned long	quicktif;
>   	unsigned long	ids;
>   	unsigned long	cs;
>   	unsigned long	clear;
> @@ -532,6 +533,14 @@ rseq_exit_to_user_mode_work(struct pt_re
>   	return ti_work | _TIF_NOTIFY_RESUME;
>   }
>   
> +static __always_inline bool
> +rseq_exit_to_user_mode_early(unsigned long ti_work, const unsigned long mask)
> +{
> +	if (IS_ENABLED(CONFIG_HAVE_GENERIC_TIF_BITS))
> +		return (ti_work & mask) == CHECK_TIF_RSEQ;
> +	return false;
> +}
> +
>   #endif /* !CONFIG_GENERIC_ENTRY */
>   
>   static __always_inline void rseq_syscall_exit_to_user_mode(void)
> @@ -577,6 +586,11 @@ static inline unsigned long rseq_exit_to
>   {
>   	return ti_work;
>   }
> +
> +static inline bool rseq_exit_to_user_mode_early(unsigned long ti_work, const unsigned long mask)
> +{
> +	return false;
> +}
>   static inline void rseq_note_user_irq_entry(void) { }
>   static inline void rseq_syscall_exit_to_user_mode(void) { }
>   static inline void rseq_irqentry_exit_to_user_mode(void) { }
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -22,7 +22,14 @@ void __weak arch_do_signal_or_restart(st
>   	/*
>   	 * Before returning to user space ensure that all pending work
>   	 * items have been completed.
> +	 *
> +	 * Optimize for TIF_RSEQ being the only bit set.
>   	 */
> +	if (rseq_exit_to_user_mode_early(ti_work, EXIT_TO_USER_MODE_WORK)) {
> +		rseq_stat_inc(rseq_stats.quicktif);
> +		goto do_rseq;
> +	}
> +
>   	do {
>   		local_irq_enable_exit_to_user(ti_work);
>   
> @@ -56,10 +63,12 @@ void __weak arch_do_signal_or_restart(st
>   
>   		ti_work = read_thread_flags();
>   
> +	do_rseq:
>   		/*
>   		 * This returns the unmodified ti_work, when ti_work is not
> -		 * empty. In that case it waits for the next round to avoid
> -		 * multiple updates in case of rescheduling.
> +		 * empty (except for TIF_RSEQ). In that case it waits for
> +		 * the next round to avoid multiple updates in case of
> +		 * rescheduling.
>   		 *
>   		 * When it handles rseq it returns either with empty work
>   		 * on success or with TIF_NOTIFY_RESUME set on failure to
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -134,6 +134,7 @@ static int rseq_stats_show(struct seq_fi
>   		stats.signal	+= data_race(per_cpu(rseq_stats.signal, cpu));
>   		stats.slowpath	+= data_race(per_cpu(rseq_stats.slowpath, cpu));
>   		stats.fastpath	+= data_race(per_cpu(rseq_stats.fastpath, cpu));
> +		stats.quicktif	+= data_race(per_cpu(rseq_stats.quicktif, cpu));
>   		stats.ids	+= data_race(per_cpu(rseq_stats.ids, cpu));
>   		stats.cs	+= data_race(per_cpu(rseq_stats.cs, cpu));
>   		stats.clear	+= data_race(per_cpu(rseq_stats.clear, cpu));
> @@ -144,6 +145,7 @@ static int rseq_stats_show(struct seq_fi
>   	seq_printf(m, "signal: %16lu\n", stats.signal);
>   	seq_printf(m, "slowp:  %16lu\n", stats.slowpath);
>   	seq_printf(m, "fastp:  %16lu\n", stats.fastpath);
> +	seq_printf(m, "quickt: %16lu\n", stats.quicktif);
>   	seq_printf(m, "ids:    %16lu\n", stats.ids);
>   	seq_printf(m, "cs:     %16lu\n", stats.cs);
>   	seq_printf(m, "clear:  %16lu\n", stats.clear);
> 


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ