[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0090fb14-e78f-4b67-8933-bf9ef89ba0d9@efficios.com>
Date: Mon, 25 Aug 2025 15:43:32 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Thomas Gleixner <tglx@...utronix.de>, LKML <linux-kernel@...r.kernel.org>
Cc: Jens Axboe <axboe@...nel.dk>, Peter Zijlstra <peterz@...radead.org>,
"Paul E. McKenney" <paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
Paolo Bonzini <pbonzini@...hat.com>, Sean Christopherson
<seanjc@...gle.com>, Wei Liu <wei.liu@...nel.org>,
Dexuan Cui <decui@...rosoft.com>, x86@...nel.org,
Arnd Bergmann <arnd@...db.de>, Heiko Carstens <hca@...ux.ibm.com>,
Christian Borntraeger <borntraeger@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>, Huacai Chen <chenhuacai@...nel.org>,
Paul Walmsley <paul.walmsley@...ive.com>, Palmer Dabbelt <palmer@...belt.com>
Subject: Re: [patch V2 37/37] entry/rseq: Optimize for TIF_RSEQ on exit
On 2025-08-23 12:40, Thomas Gleixner wrote:
> Further analysis of the exit path with the seperate TIF_RSEQ showed that
> depending on the workload a significant amount of invocations of
> resume_user_mode_work() ends up with no other bit set than TIF_RSEQ.
>
> On architectures with a separate TIF_RSEQ this can be distinguished and
> checked right at the beginning of the function before entering the loop.
>
> The quick check is lightweight so it does not impose a massive penalty on
> non-RSEQ use cases. It just checks for the work being empty, except for
> TIF_RSEQ and jumps right into the handling fast path.
>
> This is truly the only TIF bit there which can be optimized that way
> because the handling runs only when all the other work has been done. The
> optimization spares a full round trip through the other conditionals and an
> interrupt enable/disable pair. The generated code looks reasonable enough
> to justify this and the resulting numbers do so as well.
>
> The main beneficiaries are blocking syscall heavy work loads, where the
> tasks often end up being scheduled on a different CPU or get a different MM
> CID, but have no other work to handle on return.
>
> A futex benchmark showed up to 90% shortcut utilization and a measurable
> improvement in perf of ~1%. Non-scheduling work loads do neither see an
> improvement nor degrade. A full kernel build shows about 15% shortcuts,
> but no measurable side effects in either direction.
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
>
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> ---
> include/linux/rseq_entry.h | 14 ++++++++++++++
> kernel/entry/common.c | 13 +++++++++++--
> kernel/rseq.c | 2 ++
> 3 files changed, 27 insertions(+), 2 deletions(-)
>
> --- a/include/linux/rseq_entry.h
> +++ b/include/linux/rseq_entry.h
> @@ -11,6 +11,7 @@ struct rseq_stats {
> unsigned long signal;
> unsigned long slowpath;
> unsigned long fastpath;
> + unsigned long quicktif;
> unsigned long ids;
> unsigned long cs;
> unsigned long clear;
> @@ -532,6 +533,14 @@ rseq_exit_to_user_mode_work(struct pt_re
> return ti_work | _TIF_NOTIFY_RESUME;
> }
>
> +static __always_inline bool
> +rseq_exit_to_user_mode_early(unsigned long ti_work, const unsigned long mask)
> +{
> + if (IS_ENABLED(CONFIG_HAVE_GENERIC_TIF_BITS))
> + return (ti_work & mask) == CHECK_TIF_RSEQ;
> + return false;
> +}
> +
> #endif /* !CONFIG_GENERIC_ENTRY */
>
> static __always_inline void rseq_syscall_exit_to_user_mode(void)
> @@ -577,6 +586,11 @@ static inline unsigned long rseq_exit_to
> {
> return ti_work;
> }
> +
> +static inline bool rseq_exit_to_user_mode_early(unsigned long ti_work, const unsigned long mask)
> +{
> + return false;
> +}
> static inline void rseq_note_user_irq_entry(void) { }
> static inline void rseq_syscall_exit_to_user_mode(void) { }
> static inline void rseq_irqentry_exit_to_user_mode(void) { }
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -22,7 +22,14 @@ void __weak arch_do_signal_or_restart(st
> /*
> * Before returning to user space ensure that all pending work
> * items have been completed.
> + *
> + * Optimize for TIF_RSEQ being the only bit set.
> */
> + if (rseq_exit_to_user_mode_early(ti_work, EXIT_TO_USER_MODE_WORK)) {
> + rseq_stat_inc(rseq_stats.quicktif);
> + goto do_rseq;
> + }
> +
> do {
> local_irq_enable_exit_to_user(ti_work);
>
> @@ -56,10 +63,12 @@ void __weak arch_do_signal_or_restart(st
>
> ti_work = read_thread_flags();
>
> + do_rseq:
> /*
> * This returns the unmodified ti_work, when ti_work is not
> - * empty. In that case it waits for the next round to avoid
> - * multiple updates in case of rescheduling.
> + * empty (except for TIF_RSEQ). In that case it waits for
> + * the next round to avoid multiple updates in case of
> + * rescheduling.
> *
> * When it handles rseq it returns either with empty work
> * on success or with TIF_NOTIFY_RESUME set on failure to
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -134,6 +134,7 @@ static int rseq_stats_show(struct seq_fi
> stats.signal += data_race(per_cpu(rseq_stats.signal, cpu));
> stats.slowpath += data_race(per_cpu(rseq_stats.slowpath, cpu));
> stats.fastpath += data_race(per_cpu(rseq_stats.fastpath, cpu));
> + stats.quicktif += data_race(per_cpu(rseq_stats.quicktif, cpu));
> stats.ids += data_race(per_cpu(rseq_stats.ids, cpu));
> stats.cs += data_race(per_cpu(rseq_stats.cs, cpu));
> stats.clear += data_race(per_cpu(rseq_stats.clear, cpu));
> @@ -144,6 +145,7 @@ static int rseq_stats_show(struct seq_fi
> seq_printf(m, "signal: %16lu\n", stats.signal);
> seq_printf(m, "slowp: %16lu\n", stats.slowpath);
> seq_printf(m, "fastp: %16lu\n", stats.fastpath);
> + seq_printf(m, "quickt: %16lu\n", stats.quicktif);
> seq_printf(m, "ids: %16lu\n", stats.ids);
> seq_printf(m, "cs: %16lu\n", stats.cs);
> seq_printf(m, "clear: %16lu\n", stats.clear);
>
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Powered by blists - more mailing lists