linux-kernel - Re: [patch V3 02/12] rseq: Add fields and constants for time slice extension

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8181a3e3-f49f-4278-9b15-67211d6151e0@efficios.com>
Date: Mon, 3 Nov 2025 12:00:30 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Thomas Gleixner <tglx@...utronix.de>, LKML <linux-kernel@...r.kernel.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
 "Paul E. McKenney" <paulmck@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
 Jonathan Corbet <corbet@....net>,
 Prakash Sangappa <prakash.sangappa@...cle.com>,
 Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
 K Prateek Nayak <kprateek.nayak@....com>,
 Steven Rostedt <rostedt@...dmis.org>,
 Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
 Arnd Bergmann <arnd@...db.de>, linux-arch@...r.kernel.org,
 Florian Weimer <fweimer@...hat.com>, "carlos@...hat.com"
 <carlos@...hat.com>, Dmitry Vyukov <dvyukov@...gle.com>,
 Marco Elver <elver@...gle.com>, Peter Oskolkov <posk@...k.io>
Subject: Re: [patch V3 02/12] rseq: Add fields and constants for time slice
 extension

On 2025-10-31 16:58, Thomas Gleixner wrote:
> On Fri, Oct 31 2025 at 15:31, Mathieu Desnoyers wrote:
>> On 2025-10-29 09:22, Thomas Gleixner wrote:
>> [...]
>>> +
>>> +The thread has to enable the functionality via prctl(2)::
>>> +
>>> +    prctl(PR_RSEQ_SLICE_EXTENSION, PR_RSEQ_SLICE_EXTENSION_SET,
>>> +          PR_RSEQ_SLICE_EXT_ENABLE, 0, 0);
>>
>> Enabling specifically for each thread requires hooking into thread
>> creation, and is not a good fit for enabling this from executable or
>> library constructor function.
> 
> Where is the problem? It's not rocket science to handle that in user
> space.

Overhead at thread creation is a metric that is closely followed
by glibc developers. If we want a fine-grained per-thread control over
the slice extension mechanism, it would be good if we can think of a way
to allow userspace to enable it through either clone3 or rseq so
we don't add another round-trip to the kernel at thread creation.
This could be done either as an addition to this prctl, or as a
replacement if we don't want to add two ways to do the same thing.

AFAIU executable startup is not in the same ballpark performance
wise as thread creation, so adding the overhead of an additional
system call there is less frowned upon. This is why I am asking
whether per-thread granularity is needed.

> 
>> What is the use-case for enabling it only for a few threads within
>> a process rather than for the entire process ?
> 
> My general approach to all of this is to reduce overhead by default and
> to provide the ability of fine grained control.

That is a sound approach, I agree.

> 
> Using time slice extensions requires special care and a use case which
> justifies the extra work to be done. So those people really can be asked
> to do the extra work of enabling it, no?

I don't mind that it needs to be enabled explicitly at all. What
am I asking here is what is the best way to express this enablement
ABI.

Here I'm just trying to put myself into the shoes of userspace
library developers to see whether the proposed ABI is a good fit, or
if due to resource conflicts with other pieces of the ecosystem
or because of overhead reasons it will be unusable by the core userspace
libraries like libc and limited to be used by niche users.

> 
> I really don't get your attitude of enabling everything by default and
> thereby inflicting the maximum amount of overhead on everything.
> 
> I've just wasted weeks to cure the fallout of that approach and it's
> still unsatisfying because the whole CID management crud and related
> overhead is there unconditionally with exactly zero users on any
> distro. The special use cases of the uncompilable gurgle tcmalloc and
> the esoteric librseq are not a justification at all to inflict that on
> everyone.
> 
> Sadly nobody noticed when this got merged and now with RSEQ being widely
> used by glibc it's even harder to turn the clock back. I'm still tempted
> to break this half thought out ABI and make CID opt-in and default to
> CID = CPUID if not activated.

That's a good idea. Making mm_cid use cpu_id by default would not break
anything in terms of hard limits. Sure it's not close to 0 anymore, but
no application should misbehave because of this.

Then there is the question of how it should be enabled. For mm_cid,
it really only makes sense per-process.

One possibility here would be to introduce an "rseq features" enablement
prctl that affects the entire process. It would have to be done while
the process is single threaded. This could gate both mm_cid and time
slice extension.

> 
> Seriously the kernel is there to manage resources and provide resource
> control, but it's not there to accomodate the laziness of user space
> programmers and to proliferate the 'I envision this to be widely used'
> wishful thinking mindset.

Gating the mm_cid with a default to cpu_id is fine with me.

> 
> That said I'm not completely against making this per process, but then
> it has to be enabled on the main thread _before_ it spawns threads and
> rejected otherwise.

Agreed. And it would be nice if we can achieve rseq feature enablement
in a way that is relatively common for all rseq features (e.g. through
a single prctl option, applying per-process, requiring single-threaded
state).

> 
> That said I just went down the obvious road of making it opt-in and
> therefore low overhead and flexible by default. That is correct, simple
> and straight forward. No?

As I pointed out above, I'm simply trying to find the way to express
this feature enablement in a way that's the best fit for the userspace
ecosystem as well, without that being too much trouble on the kernel
side.

I note your wish to gate the mm_cid with a similar enablement ABI, and
I'm OK with this unless an existing mm_cid user considers this a
significant ABI break. As maintainer of librseq I'm OK with this,
we should ask the tcmalloc maintainers.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com