[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0BF9AF0D-EA88-4504-99E4-BB3674FA588F@oracle.com>
Date: Fri, 19 Sep 2025 17:30:40 +0000
From: Prakash Sangappa <prakash.sangappa@...cle.com>
To: Thomas Gleixner <tglx@...utronix.de>
CC: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
LKML
<linux-kernel@...r.kernel.org>,
Peter Zilstra <peterz@...radead.org>,
"Paul
E. McKenney" <paulmck@...nel.org>,
Boqun Feng <boqun.feng@...il.com>, Jonathan Corbet <corbet@....net>,
Madadi Vineeth Reddy
<vineethr@...ux.ibm.com>,
K Prateek Nayak <kprateek.nayak@....com>,
Steven
Rostedt <rostedt@...dmis.org>,
Sebastian Andrzej Siewior
<bigeasy@...utronix.de>,
Arnd Bergmann <arnd@...db.de>,
"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
Florian Weimer
<fweimer@...hat.com>,
"carlos@...hat.com" <carlos@...hat.com>,
"libc-coord@...ts.openwall.com" <libc-coord@...ts.openwall.com>
Subject: Re: [patch 00/12] rseq: Implement time slice extension mechanism
> On Sep 13, 2025, at 6:02 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
>
> On Fri, Sep 12 2025 at 15:26, Mathieu Desnoyers wrote:
>> On 2025-09-12 12:31, Thomas Gleixner wrote:
>>>> 2) Slice requests are a good fit for locking. Locking typically
>>>> has nesting ability.
>>>>
>>>> We should consider making the slice request ABI a 8-bit
>>>> or 16-bit nesting counter to allow nesting of its users.
>>>
>>> Making request a counter requires to keep request set when the
>>> extension is granted. So the states would be:
>>>
>>> request granted
>>> 0 0 Neutral
>>>> 0 0 Requested
>>>> =0 1 Granted
>>
>
> Second thoughts on this.
>
> Such a scheme means that slice_ctrl.request must be read only for the
> kernel because otherwise the user space decrement would need to be an
> atomic dec_if_not_zero(). We just argued the one atomic operation away. :)
>
> That means, the kernel can only set and clear Granted. That in turn
> loses the information whether a slice extension was denied or revoked,
> which was something the Oracle people wanted to have. I'm not sure
> whether that was a functional or more a instrumentation feature.
The denied indication was mainly instrumentation for observability to see
if a user application would attempt to set ‘REQUEST' again without yielding.
>
> But what's worse: this is a receipe for disaster as it creates obviously
> subtle and hard to debug ways to leak an increment, which means the
> request would stay active forever defeating the whole purpose.
>
> And no, the kernel cannot keep track of the counter and observe whether
> it became zero at some point or not. You surely could come up with a
> convoluted scheme to work around that in form of sequence counters or
> whatever, but that just creates extra complexity for a very dubious
> value.
>
> The point is that the time slice extension is just providing an
> opportunistic priority ceiling mechanism with low overhead and without
> guarantees.
>
> Once a request is not granted or revoked, the performance of that
> particular operation goes south no matter what. Nesting does not help
> there at all, which is a strong argument for using KISS as the primary
> engineering principle here.
>
> The simple boolean request/granted pair is simple and very well
> defined. It does not suffer from any of those problems.
Agree, I think keeping the API simple will be preferable. The request/granted
sequence makes sense.
>
> If user space wants nesting, then it can do so on its own without
> creating an ill defined and fragile kernel/user ABI. We created enough
> of them in the past and all of them resulted in long term headaches.
Guess user space should be able to handle nesting, possibly without the need of a counter?
AFAICS can’t the nested request, to extend the slice, be handled by checking
if both ‘REQUEST’ & ‘GRANTED’ bits are zero? If so, attempt to request
slice extension. Otherwise If either REQUEST or GRANTED bit Is set, then a slice
extension has been already requested or granted.
>
>> Handling syscall within granted extension by killing the process
>
> I'm absolutely not opposed to lift the syscall restriction to make
> things easier, but this is the wrong argument for it:
Killing the process seems drastic, and could deter use of this feature.
Can the consequence of calling the system be handled by calling schedule()
in syscall entry path if extension was granted, as you were implying?
Thanks
-Prakash
>
>> will likely reserve this feature to the niche use-cases.
>
> Having this used only by people who actually know what they are doing is
> actually the preferred outcome.
>
> We've seen it over and over that supposedly "easy" features result in
> mindless overutilization because everyone and his dog thinks they need
> them just because and for the very wrong reasons. The unconditional
> usage of the most power hungry floating point extensions just because
> they are available, is only one example of many.
>
> Thanks,
>
> tglx
Powered by blists - more mailing lists