linux-kernel - Re: [RFC PATCH for 4.18 00/16] Restartable Sequences

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1259134501.7268.1532979258894.JavaMail.zimbra@efficios.com>
Date:   Mon, 30 Jul 2018 15:34:18 -0400 (EDT)
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Pavel Machek <pavel@....cz>
Cc:     carlos <carlos@...hat.com>, Florian Weimer <fweimer@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Andy Lutomirski <luto@...capital.net>,
        Dave Watson <davejwatson@...com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-api <linux-api@...r.kernel.org>,
        Paul Turner <pjt@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Russell King <linux@....linux.org.uk>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, Andrew Hunter <ahh@...gle.com>,
        Andi Kleen <andi@...stfloor.org>, Chris Lameter <cl@...ux.com>,
        Ben Maurer <bmaurer@...com>, rostedt <rostedt@...dmis.org>,
        Josh Triplett <josh@...htriplett.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will.deacon@....com>,
        Michael Kerrisk <mtk.manpages@...il.com>,
        Joel Fernandes <joelaf@...gle.com>
Subject: Re: [RFC PATCH for 4.18 00/16] Restartable Sequences

----- On Jul 30, 2018, at 3:07 PM, Pavel Machek pavel@....cz wrote:

> Hi!
> 
>> > Thanks for pointer.
>> > 
>> > +Restartable sequences are atomic with respect to preemption (making
>> > it
>> > +atomic with respect to other threads running on the same CPU), as
>> > well
>> > +as signal delivery (user-space execution contexts nested over the
>> > same
>> > +thread).
>> > 
>> > So the threads are protected against sigkill when running the
>> > restartable sequence?
>> 
>> In that scenario, SIGKILL _will_ be delivered, hence execution of the
>> rseq critical section will never reach the commit instruction. This
>> follows the guarantee provided that the rseq c.s. either executes
>> completely "atomically" wrt preemption/signal delivery, *or* gets
>> aborted. In this case, sigkill will reap the entire process, so
> 
> The text above does not mention abort -- so I was just making
> sure. Maybe mentioning it would be good idea?

How about this ?

Restartable sequences are atomic with respect to preemption (making it
atomic with respect to other threads running on the same CPU), as well
as signal delivery (user-space execution contexts nested over the same
thread). They either complete atomically with respect to preemption on
the current CPU and signal delivery, or they are aborted.

[...]

> 
>> > +Optimistic cache of the CPU number on which the current thread is
>> > +running. Its value is guaranteed to always be a possible CPU number,
>> > +even when rseq is not initialized. The value it contains should
>> > always
>> > +be confirmed by reading the cpu_id field.
>> > 
>> > I'm not sure what "optimistic cache" is...
>> 
>> Perhaps we can find a better wording.
>> 
>> It's "optimistic" in the sense that it's always guaranteed to hold a
>> valid CPU number within the range [ 0 .. nr_possible_cpus - 1 ]. It can
>> therefore be loaded by user-space and then used as an offset, without
>> having to check whether it is within valid bounds compared to the number
>> of possible CPUs in the system.
>> 
>> This works even if the kernel on which the application runs on does not
>> support rseq at all: the __rseq_abi->cpu_id_start field stays initialized at
>> 0, which is indeed a valid CPU number. It's therefore valid to use it as an
>> offset in per-cpu data structures, and only validate whether it's actually the
>> current CPU number by comparing it with the __rseq_abi->cpu_id field
>> within the rseq critical section. If rseq is not available in the kernel,
>> that cpu_id field stays initialized at -1, so the comparison always fails,
>> as intended.
>> 
>> It's then up to user-space to use a fall-back mechanism, considering that
>> rseq is not available.
>> 
>> Advice on improved wording would be welcome.
> 
> Ok, that makes sense, but I'd not understand it from the man
> page. Perhaps your text should be put there?

How about this ?

.TP
.in +4n
.I cpu_id_start
Optimistic cache of the CPU number on which the current thread is
running. Its value is guaranteed to always be a possible CPU number,
even when rseq is not initialized. The value it contains should always
be confirmed by reading the cpu_id field.

This field is an optimistic cache in the sense that it is always
guaranteed to hold a valid CPU number in the range [ 0 ..
nr_possible_cpus - 1 ]. It can therefore be loaded by user-space and
used as an offset in per-cpu data structures without having to
check whether its value is within the valid bounds compared to the
number of possible CPUs in the system.

For user-space applications executed on a kernel without rseq support,
the cpu_id_start field stays initialized at 0, which is indeed a valid
CPU number. It is therefore valid to use it as an offset in per-cpu data
structures, and only validate whether it's actually the current CPU
number by comparing it with the cpu_id field within the rseq critical
section. If the kernel does not provide rseq support, that cpu_id field
stays initialized at -1, so the comparison always fails, as intended.

It is then up to user-space to use a fall-back mechanism, considering
that rseq is not available.

[...]

> 
>> > (Will not
>> > this need to be bigger on machines with bigger cache sizes?)
>> > 
>> > above it says:
>> > 
>> > +.B Structure size
>> > +This structure is extensible. Its size is passed as parameter to the
>> > +rseq system call.
>> > 
>> > I'm reading source, so maybe it refers to different structure.
>> 
>> It can be aligned on a larger multiple. This requirement of 32 bytes
>> is a minimum. Therefore, if we ever extend struct rseq, or if an
>> architecture shows benefit from aligning struct rseq on larger boundaries,
>> it is free to do so. It will still respect the requirement of alignment on
>> 32 bytes boundaries.
> 
> Well, elsewhere it said "This structure has a fixed size of 32 bytes."
> Now it says structure size is passed with every syscalls. Now I'm
> confused (but maybe that's caused by reading source, not formatted
> document).

This is the layout for struct rseq_cs version 0.

The variable-sized structure is struct rseq.

struct rseq is typically in a TLS, and contains a "rseq_cs" field
which is a pointer to the struct rseq_cs descriptor describing the
currently active rseq critical section.

Hoping this clears up the confusion.

Thanks for the review!

Mathieu


> 
> Best regards,
>									Pavel
> 
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures)
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com