linux-kernel - Re: [patch V2 28/37] rseq: Switch to fast path processing on exit to user

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a5eaf7fb-bf09-4d66-90c7-03cc5803ff68@efficios.com>
Date: Thu, 4 Sep 2025 13:54:21 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Thomas Gleixner <tglx@...utronix.de>, LKML <linux-kernel@...r.kernel.org>
Cc: Jens Axboe <axboe@...nel.dk>, Peter Zijlstra <peterz@...radead.org>,
 "Paul E. McKenney" <paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
 Paolo Bonzini <pbonzini@...hat.com>, Sean Christopherson
 <seanjc@...gle.com>, Wei Liu <wei.liu@...nel.org>,
 Dexuan Cui <decui@...rosoft.com>, x86@...nel.org,
 Arnd Bergmann <arnd@...db.de>, Heiko Carstens <hca@...ux.ibm.com>,
 Christian Borntraeger <borntraeger@...ux.ibm.com>,
 Sven Schnelle <svens@...ux.ibm.com>, Huacai Chen <chenhuacai@...nel.org>,
 Paul Walmsley <paul.walmsley@...ive.com>, Palmer Dabbelt <palmer@...belt.com>
Subject: Re: [patch V2 28/37] rseq: Switch to fast path processing on exit to
 user

On 2025-09-02 14:36, Thomas Gleixner wrote:
> On Wed, Aug 27 2025 at 09:45, Mathieu Desnoyers wrote:
>> On 2025-08-26 11:40, Mathieu Desnoyers wrote:
>>>>     RSEQ selftests      Before          After              Reduction
>>>>
>>>>     exit to user:       386281778          387373750
>>>>     signal checks:       35661203                  0           100%
>>>>     slowpath runs:      140542396 36.38%            100  0.00%    100%
>>>>     fastpath runs:                         9509789  2.51%     N/A
>>>>     id updates:         176203599 45.62%        9087994  2.35%     95%
>>>>     cs checks:          175587856 45.46%        4728394  1.22%     98%
>>>>       cs cleared:       172359544   98.16%    1319307   27.90%   99%
>>>>       cs fixup:           3228312    1.84%    3409087   72.10%
>>
>> By the way, you should really not be using the entire rseq selftests
>> as a representative workload for profiling the kernel rseq implementation.
>>
>> Those selftests include "loop injection", "yield injection", "kill
>> injection" and "sleep injection" within the relevant userspace code
>> paths, which really increase the likelihood of hitting stuff like
>> "cs fixup" compared to anything that comes close to a realistic
>> use-case. This is really useful for testing correctness, but not
>> for profiling. For instance, the "loop injection" introduces busy
>> loops within rseq critical sections to significantly increase the
>> likelihood of hitting a cs fixup.
>>
>> Those specific selftests are really just "stress-tests" that don't
>> represent any relevant workload.
> 
> True, they still tell how much useless work the kernel was doing, no?

Somewhat, but they misrepresent what should be considered as fast vs
slow paths, and thus what are relevant optimization targets.

Let me try to explain my thinking further through a comparison with
a periodic task scenario.

Let's suppose you have a periodic task that happens once per day in
normal workloads, and you alter its period in a stress-test to make it
run every 10ms to make sure you hit race conditions quickly for testing
purposes. Of course this periodic task will show up in the profiles as
a fast-path, but that's just because it's been made to run very
frequently by the stress-test setup.

Running busy loops within rseq critical sections is similar: they were
made to trigger aborts on purpose, so the aborts happen much more often
than they would in any workload that is not trying to trigger this on
purpose.

So yes the work that you see there under stress test is indeed work
that the kernel is doing in those situations, but it over-represents
the frequency of rseq aborts because those are precisely what the
stress-tests are aiming to trigger.

This is why I discourage using the loop/yield/kill/sleep injection
parts of the selftests for profiling purposes, and rather recommend
using the "benchmark" selftests which are much closer to real-life
workloads.

Of course if you are interested in optimizing the rseq ip fixup code
path, then using the stress-tests *is* relevant, because it allows
hitting that code often enough to make it significant in profiles.
But that does not mean that the rseq ip fixup scenario happens often
enough in real-life workloads to justify optimizing it.

All that being said, I'm perfectly fine with your improvements, but
I just want to clarify what should be considered as relevant metrics
that justify future optimization efforts and orient future optimization
vs code complexity trade offs.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com