[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <12342355-b3fb-4e78-ad5b-dcfff1366ccf@kernel.dk>
Date: Wed, 13 Aug 2025 11:45:09 -0600
From: Jens Axboe <axboe@...nel.dk>
To: Thomas Gleixner <tglx@...utronix.de>, LKML <linux-kernel@...r.kernel.org>
Cc: Michael Jeanson <mjeanson@...icios.com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Peter Zijlstra <peterz@...radead.org>, "Paul E. McKenney"
<paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
Wei Liu <wei.liu@...nel.org>
Subject: Re: [patch 00/11] rseq: Optimize exit to user space
On 8/13/25 10:29 AM, Thomas Gleixner wrote:
> With the more wide spread usage of rseq in glibc, rseq is not longer a
> niche use case for special applications.
>
> While working on a sane implementation of a rseq based time slice extension
> mechanism, I noticed several shortcomings of the current rseq code:
>
> 1) task::rseq_event_mask is a pointless bitfield despite the fact that
> the ABI flags it was meant to support have been deprecated and
> functionally disabled three years ago.
>
> 2) task::rseq_event_mask is accumulating bits unless there is a critical
> section discovered in the user space rseq memory. This results in
> pointless invocations of the rseq user space exit handler even if
> there had nothing changed. As a matter of correctness these bits have
> to be clear when exiting to user space and therefore pristine when
> coming back into the kernel. Aside of correctness, this also avoids
> pointless evaluation of the user space memory, which is a performance
> benefit.
>
> 3) The evaluation of critical sections does not differentiate between
> syscall and interrupt/exception exits. The current implementation
> silently fixes up critical sections which invoked a syscall unless
> CONFIG_DEBUG_RSEQ is enabled.
>
> That's just wrong. If user space does that on a production kernel it
> can keep the pieces. The kernel is not there to proliferate mindless
> user space programming and letting everyone pay the performance
> penalty.
>
> This series addresses these issues and on top converts parts of the user
> space access over to the new masked access model, which lowers the overhead
> of Spectre-V1 mitigations significantly on architectures which support it
> (x86 as of today). This is especially noticable in the access to the
> rseq_cs field in struct rseq, which is the first quick check to figure out
> whether a critical section is installed or not.
>
> It survives the kernels rseq selftests, but I did not any performance tests
> vs. rseq because I have no idea how to use the gazillion of undocumented
> command line parameters of the benchmark. I leave that to people who are so
> familiar with them, that they assume everyone else is too :)
>
> The performance gain on regular workloads is clearly measurable and the
> consistent event flag state allows now to build the time slice extension
> mechanism on top. The first POC I implemented:
>
> https://lore.kernel.org/lkml/87o6smb3a0.ffs@tglx/
>
> suffered badly from the stale eventmask bits and the cleaned up version
> brought a whopping 25+% performance gain.
Thanks for doing this work, it's been on my list to take a look at rseq
as it's quite the pig currently and enabled by default (with what I
assume is from a newer libc).
--
Jens Axboe
Powered by blists - more mailing lists