lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2cf3774b-a568-0296-651f-7edf6bf03308@efficios.com>
Date:   Thu, 28 Sep 2023 18:11:19 -0400
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Boqun Feng <boqun.feng@...il.com>,
        "H . Peter Anvin" <hpa@...or.com>, Paul Turner <pjt@...gle.com>,
        linux-api@...r.kernel.org, Christian Brauner <brauner@...nel.org>,
        Florian Weimer <fw@...eb.enyo.de>, David.Laight@...LAB.COM,
        carlos@...hat.com, Peter Oskolkov <posk@...k.io>,
        Alexander Mikhalitsyn <alexander@...alicyn.com>,
        Chris Kennelly <ckennelly@...gle.com>,
        Ingo Molnar <mingo@...hat.com>,
        Darren Hart <dvhart@...radead.org>,
        Davidlohr Bueso <dave@...olabs.net>,
        André Almeida <andrealmeid@...lia.com>,
        libc-alpha@...rceware.org, Steven Rostedt <rostedt@...dmis.org>,
        Jonathan Corbet <corbet@....net>,
        Noah Goldstein <goldstein.w.n@...il.com>,
        Daniel Colascione <dancol@...gle.com>, longman@...hat.com,
        Florian Weimer <fweimer@...hat.com>
Subject: Re: [RFC PATCH v2 1/4] rseq: Add sched_state field to struct rseq

On 9/28/23 16:54, Thomas Gleixner wrote:
> On Mon, May 29 2023 at 15:14, Mathieu Desnoyers wrote:
>> +/*
>> + * rseq_sched_state should be aligned on the cache line size.
>> + */
>> +struct rseq_sched_state {
>> +	/*
>> +	 * Version of this structure. Populated by the kernel, read by
>> +	 * user-space.
>> +	 */
>> +	__u32 version;
>> +	/*
>> +	 * The state is updated by the kernel. Read by user-space with
>> +	 * single-copy atomicity semantics. This field can be read by any
>> +	 * userspace thread. Aligned on 32-bit. Contains a bitmask of enum
>> +	 * rseq_sched_state_flags. This field is provided as a hint by the
>> +	 * scheduler, and requires that the page holding this state is
>> +	 * faulted-in for the state update to be performed by the scheduler.
>> +	 */
>> +	__u32 state;
>> +	/*
>> +	 * Thread ID associated with the thread registering this structure.
>> +	 * Initialized by user-space before registration.
>> +	 */
>> +	__u32 tid;
>> +};
>> +
>>   /*
>>    * struct rseq is aligned on 4 * 8 bytes to ensure it is always
>>    * contained within a single cache-line.
>> @@ -148,6 +180,15 @@ struct rseq {
>>   	 */
>>   	__u32 mm_cid;
>>   
>> +	__u32 padding1;
>> +
>> +	/*
>> +	 * Restartable sequences sched_state_ptr field. Initialized by
>> +	 * userspace to the address at which the struct rseq_sched_state is
>> +	 * located. Read by the kernel on rseq registration.
>> +	 */
>> +	__u64 sched_state_ptr;
>> +
> 
> Why is this a separate entity instead of being simply embedded into
> struct rseq?
> 
> Neither the code comment nor the changelog tells anything about that.

Here is the email thread from v1 that led to this:

https://lore.kernel.org/lkml/ZGevZxOjJLMO9zlM@boqun-archlinux/

The reason for moving this sched_state to its own structure was to 
optimize for a scenario where we have:

- many threads contending for the lock, thus loading the value of the 
struct rseq "sched_state".
- the lock owner thread doing rseq critical sections with the lock held, 
thus updating the value of struct rseq "rseq_cs" field (in the same 
cache line).

The loads of the sched_state from other threads would slow down (even 
slightly) the update to the rseq_cs field, thus causing the critical 
sections to take a few more cycles.

I am not convinced that the complexity vs speed tradeoff of moving this 
into its own separate structure is worth it though. Especially given 
that it would require userspace to wire things up with an additional 
side-structure, rather than just extend naturally with the 
extensible-rseq ABI. Another approach would be to cacheline-align this 
field within struct rseq and waste space to padding.

I am inclined to just embed this into struct rseq without caring too 
much about slight overhead caused by sharing the cache line with other 
fields.

> 
> If your intention was to provide a solution for process shared futexes
> then you completely failed to understand how process shared futexes
> work. If not, then please explain why you need a pointer and the
> associated hackery to deal with it.

I have not given a single thought to shared futexes in this PoC so far. :)

So let's see: we could do the following to support shared futexes: move 
the information about the owner's struct rseq (could be the thread 
pointer of the owner thread) to a field separate from the "lock" state 
(which is ABI as you pointed out). Therefore, grabbing the lock would be 
done with an atomic instruction, and then setting this extra information 
about lock owner would be done with a simple store.

Given that the lock owner information is only used to statistically 
decide whether other threads should spin or block when contended, we 
don't really care if this information is set a few instructions after 
acquiring the lock.

Thoughts ?

Thanks for the feedback,

Mathieu

> 
> Thanks,
> 
>          tglx
> 
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ