linux-kernel - Re: [RFC PATCH v2 1/4] rseq: Add sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230926205215.472650-1-dvyukov@google.com>
Date:   Tue, 26 Sep 2023 22:52:15 +0200
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     mathieu.desnoyers@...icios.com
Cc:     David.Laight@...LAB.COM, alexander@...alicyn.com,
        andrealmeid@...lia.com, boqun.feng@...il.com, brauner@...nel.org,
        carlos@...hat.com, ckennelly@...gle.com, corbet@....net,
        dancol@...gle.com, dave@...olabs.net, dvhart@...radead.org,
        fweimer@...hat.com, goldstein.w.n@...il.com, hpa@...or.com,
        libc-alpha@...rceware.org, linux-api@...r.kernel.org,
        linux-kernel@...r.kernel.org, longman@...hat.com, mingo@...hat.com,
        paulmck@...nel.org, peterz@...radead.org, pjt@...gle.com,
        posk@...k.io, rostedt@...dmis.org, tglx@...utronix.de
Subject: Re: [RFC PATCH v2 1/4] rseq: Add sched_state field to struct rseq

>> I don't see why we can't stick this directly into struct rseq because
>> it's all public anyway.
>
> The motivation for moving this to a different cache line is to handle 
> the prior comment from Boqun, who is concerned that busy-waiting 
> repeatedly loading a field from struct rseq will cause false-sharing and 
> make other stores to that cache line slower, especially stores to 
> rseq_cs to begin rseq critical sections, thus slightly increasing the 
> overhead of rseq critical sections taken while mutexes are held.
>
> If we want to embed this field into struct rseq with its own cache line, 
> then we need to add a lot of padding, which is inconvenient.
>
> That being said, perhaps this is premature optimization, what do you think ?

Hi Mathieu, Florian,

This is exciting!

I thought the motivation for moving rseq_sched_state out of struct rseq
is lifetime management problem. I assume when a thread locks a mutex,
it stores pointer to rseq_sched_state in the mutex state for other
threads to poll. So the waiting thread would do something along the following
lines:

rseq_sched_state* state = __atomic_load_n(mutex->sched_state, __ATOMIC_RELAXED);
if (state && !(state->state & RSEQ_SCHED_STATE_FLAG_ON_CPU))
	futex_wait();

Now if the state is struct rseq, which is stored in TLS,
then the owning thread can unlock the mutex, exit and unmap TLS in between.
Consequently, load of state->state will cause a paging fault.

And we do want rseq in TLS to save 1 indirection.

If rseq_sched_state is separated from struct rseq, then it can be allocated
in type stable memory that is never unmapped.

What am I missing here?

However, if we can store this state in struct rseq, then an alternative
interface would for the kernel to do:

rseq->cpu_id = -1;

to denote that the thread is not running on any CPU.
I think it kinda makes sense, rseq->cpu_id is the thread's current CPU,
and -1 naturally means "not running at all". And we already store -1
right after init, so it shouldn't be a surprising value.