lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <12342355-b3fb-4e78-ad5b-dcfff1366ccf@kernel.dk>
Date: Wed, 13 Aug 2025 11:45:09 -0600
From: Jens Axboe <axboe@...nel.dk>
To: Thomas Gleixner <tglx@...utronix.de>, LKML <linux-kernel@...r.kernel.org>
Cc: Michael Jeanson <mjeanson@...icios.com>,
 Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 Peter Zijlstra <peterz@...radead.org>, "Paul E. McKenney"
 <paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
 Wei Liu <wei.liu@...nel.org>
Subject: Re: [patch 00/11] rseq: Optimize exit to user space

On 8/13/25 10:29 AM, Thomas Gleixner wrote:
> With the more wide spread usage of rseq in glibc, rseq is not longer a
> niche use case for special applications.
> 
> While working on a sane implementation of a rseq based time slice extension
> mechanism, I noticed several shortcomings of the current rseq code:
> 
>   1) task::rseq_event_mask is a pointless bitfield despite the fact that
>      the ABI flags it was meant to support have been deprecated and
>      functionally disabled three years ago.
> 
>   2) task::rseq_event_mask is accumulating bits unless there is a critical
>      section discovered in the user space rseq memory. This results in
>      pointless invocations of the rseq user space exit handler even if
>      there had nothing changed. As a matter of correctness these bits have
>      to be clear when exiting to user space and therefore pristine when
>      coming back into the kernel. Aside of correctness, this also avoids
>      pointless evaluation of the user space memory, which is a performance
>      benefit.
> 
>   3) The evaluation of critical sections does not differentiate between
>      syscall and interrupt/exception exits. The current implementation
>      silently fixes up critical sections which invoked a syscall unless
>      CONFIG_DEBUG_RSEQ is enabled.
> 
>      That's just wrong. If user space does that on a production kernel it
>      can keep the pieces. The kernel is not there to proliferate mindless
>      user space programming and letting everyone pay the performance
>      penalty.
> 
> This series addresses these issues and on top converts parts of the user
> space access over to the new masked access model, which lowers the overhead
> of Spectre-V1 mitigations significantly on architectures which support it
> (x86 as of today). This is especially noticable in the access to the
> rseq_cs field in struct rseq, which is the first quick check to figure out
> whether a critical section is installed or not.
> 
> It survives the kernels rseq selftests, but I did not any performance tests
> vs. rseq because I have no idea how to use the gazillion of undocumented
> command line parameters of the benchmark. I leave that to people who are so
> familiar with them, that they assume everyone else is too :)
> 
> The performance gain on regular workloads is clearly measurable and the
> consistent event flag state allows now to build the time slice extension
> mechanism on top. The first POC I implemented:
> 
>    https://lore.kernel.org/lkml/87o6smb3a0.ffs@tglx/
> 
> suffered badly from the stale eventmask bits and the cleaned up version
> brought a whopping 25+% performance gain.

Thanks for doing this work, it's been on my list to take a look at rseq
as it's quite the pig currently and enabled by default (with what I
assume is from a newer libc).

-- 
Jens Axboe


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ