linux-kernel - Re: [patch 00/12] rseq: Implement time slice extension mechanism

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <874it6qzd0.ffs@tglx>
Date: Sat, 13 Sep 2025 15:02:51 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, LKML
 <linux-kernel@...r.kernel.org>
Cc: Peter Zilstra <peterz@...radead.org>, "Paul E. McKenney"
 <paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>, Jonathan Corbet
 <corbet@....net>, Prakash Sangappa <prakash.sangappa@...cle.com>, Madadi
 Vineeth Reddy <vineethr@...ux.ibm.com>, K Prateek Nayak
 <kprateek.nayak@....com>, Steven Rostedt <rostedt@...dmis.org>, Sebastian
 Andrzej Siewior <bigeasy@...utronix.de>, Arnd Bergmann <arnd@...db.de>,
 linux-arch@...r.kernel.org, Florian Weimer <fweimer@...hat.com>,
 "carlos@...hat.com" <carlos@...hat.com>, libc-coord@...ts.openwall.com
Subject: Re: [patch 00/12] rseq: Implement time slice extension mechanism

On Fri, Sep 12 2025 at 15:26, Mathieu Desnoyers wrote:
> On 2025-09-12 12:31, Thomas Gleixner wrote:
>>> 2) Slice requests are a good fit for locking. Locking typically
>>>      has nesting ability.
>>>
>>>      We should consider making the slice request ABI a 8-bit
>>>      or 16-bit nesting counter to allow nesting of its users.
>> 
>> Making request a counter requires to keep request set when the
>> extension is granted. So the states would be:
>> 
>>       request    granted
>>       0          0               Neutral
>>       >0         0               Requested
>>       >=0        1               Granted
>

Second thoughts on this.

Such a scheme means that slice_ctrl.request must be read only for the
kernel because otherwise the user space decrement would need to be an
atomic dec_if_not_zero(). We just argued the one atomic operation away. :)

That means, the kernel can only set and clear Granted. That in turn
loses the information whether a slice extension was denied or revoked,
which was something the Oracle people wanted to have. I'm not sure
whether that was a functional or more a instrumentation feature.

But what's worse: this is a receipe for disaster as it creates obviously
subtle and hard to debug ways to leak an increment, which means the
request would stay active forever defeating the whole purpose.

And no, the kernel cannot keep track of the counter and observe whether
it became zero at some point or not. You surely could come up with a
convoluted scheme to work around that in form of sequence counters or
whatever, but that just creates extra complexity for a very dubious
value.

The point is that the time slice extension is just providing an
opportunistic priority ceiling mechanism with low overhead and without
guarantees.

Once a request is not granted or revoked, the performance of that
particular operation goes south no matter what. Nesting does not help
there at all, which is a strong argument for using KISS as the primary
engineering principle here.

The simple boolean request/granted pair is simple and very well
defined. It does not suffer from any of those problems.

If user space wants nesting, then it can do so on its own without
creating an ill defined and fragile kernel/user ABI. We created enough
of them in the past and all of them resulted in long term headaches.

> Handling syscall within granted extension by killing the process

I'm absolutely not opposed to lift the syscall restriction to make
things easier, but this is the wrong argument for it:

> will likely reserve this feature to the niche use-cases.

Having this used only by people who actually know what they are doing is
actually the preferred outcome.

We've seen it over and over that supposedly "easy" features result in
mindless overutilization because everyone and his dog thinks they need
them just because and for the very wrong reasons. The unconditional
usage of the most power hungry floating point extensions just because
they are available, is only one example of many.

Thanks,

        tglx