lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d4b25a25-f336-40eb-b9de-3e370050b60c@efficios.com>
Date: Mon, 22 Sep 2025 09:55:05 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Prakash Sangappa <prakash.sangappa@...cle.com>,
 Thomas Gleixner <tglx@...utronix.de>
Cc: LKML <linux-kernel@...r.kernel.org>, Peter Zijlstra
 <peterz@...radead.org>, "Paul E. McKenney" <paulmck@...nel.org>,
 Boqun Feng <boqun.feng@...il.com>, Jonathan Corbet <corbet@....net>,
 Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
 K Prateek Nayak <kprateek.nayak@....com>,
 Steven Rostedt <rostedt@...dmis.org>,
 Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
 Arnd Bergmann <arnd@...db.de>,
 "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
 Michael Jeanson <mjeanson@...icios.com>
Subject: Re: [patch 02/12] rseq: Add fields and constants for time slice
 extension

On 2025-09-22 01:28, Prakash Sangappa wrote:
> 
> 
>> On Sep 8, 2025, at 3:59 PM, Thomas Gleixner <tglx@...utronix.de> wrote:
>>
> ..
>> +enum rseq_slice_masks {
>> + RSEQ_SLICE_EXT_REQUEST = (1U << RSEQ_SLICE_EXT_REQUEST_BIT),
>> + RSEQ_SLICE_EXT_GRANTED = (1U << RSEQ_SLICE_EXT_GRANTED_BIT),
>> };
>>
>> /*
>> @@ -142,6 +164,12 @@ struct rseq {
>> __u32 mm_cid;
>>
>> /*
>> + * Time slice extension control word. CPU local atomic updates from
>> + * kernel and user space.
>> + */
>> + __u32 slice_ctrl;
> 
> We intend to backport the slice extension feature to older kernel versions.
> 
> With use of a new structure member for slice control, could there be discrepancy
> with rseq structure size(older version) registered by libc?  In that case the application
> may  not be able to use slice extension feature unless Libc’s use of rseq is disabled.

The rseq extension scheme allows this to seamlessly work.

You will need a glibc 2.41+, which uses the getauxval(3)
AT_RSEQ_FEATURE_SIZE and AT_RSEQ_ALIGN to query the feature size
supported by the Linux kernel. It allocates a per-thread memory
area which is large enough to support that feature set, and
registers it to the kernel through rseq(2) on thread creation.

Note that before we had the extensible rseq scheme, glibc registered
a 32-byte structure (including padding at the end), which is considered
as the rseq "original" registration size.

The "mm_cid" field ends at 28 bytes, which leaves 4 bytes of padding at
the end of the original rseq structure. Considering that the time slice
extension fields will likely fit within those 4 bytes, I expect that
applications linked against glibc [2.35, 2.40] will also be able to use
those fields. Those applications should use getauxval(3)
AT_RSEQ_FEATURE_SIZE to validate whether the kernel populates this field
or if it's just padding.

Note that this all works even if you backport the feature to an older kernel:
the rseq extension scheme does not depend on querying the kernel version at
all. You will however be required to backport the support for additional
rseq fields that come before the time slice, such as node_id and mm_cid,
if they are not implemented in your older kernel.

> 
> Application would have to verify structure size, so should it be mentioned  in the
> documentation.

Yes, applications should check that the glibc's __rseq_size is large enough to fit
the new slice field(s), *and* for the original rseq size special case
(32 bytes including padding), those would need to query getauxval(3)
AT_RSEQ_FEATURE_SIZE to make sure the field is indeed supported.

  Also, perhaps make the prctl() enable call return error, if structure size
> does not match?

That's not how the extensible scheme works.

Either glibc registers a 32-byte area (in which the time slice feature would
fit), or it registers an area large enough to fit all kernel supported features,
or it fails registration. And prctl() is per-process, whereas the rseq registration
is per-thread, so it's kind of weird to make prctl() fail if the current
thread's rseq is not registered.

> 
> With regards to application determining the address and size of rseq structure
> registered by libc, what are you thoughts on getting that thru the rseq(2)
> system call or a prctl() call instead of dealing with the __week symbols as was discussed here.
> 
> https://lore.kernel.org/all/F9DBABAD-ABF0-49AA-9A38-BD4D2BE78B94@oracle.com/

I think that the other leg of that email thread got to a resolution of both static and
dynamic use-cases through use of an extern __weak symbol, no [1] ? Not that I am against
adding a rseq(2) query for rseq address, size, and signature, but I just want to double
check that it would be there for convenience and is not actually needed in the typical
use-cases.

Thanks,

Mathieu

[1] https://lore.kernel.org/all/aKPFIQwg5zxSS5oS@google.com/

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ