linux-kernel - Re: [RFC PATCH for 5.8 3/4] rseq: Introduce RSEQ_FLAG_RELIABLE_CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2088331919.943.1594118895344.JavaMail.zimbra@efficios.com>
Date:   Tue, 7 Jul 2020 06:48:15 -0400 (EDT)
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Florian Weimer <fw@...eb.enyo.de>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        paulmck <paulmck@...ux.ibm.com>,
        Boqun Feng <boqun.feng@...il.com>,
        "H. Peter Anvin" <hpa@...or.com>, Paul Turner <pjt@...gle.com>,
        linux-api <linux-api@...r.kernel.org>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Neel Natu <neelnatu@...gle.com>
Subject: Re: [RFC PATCH for 5.8 3/4] rseq: Introduce
 RSEQ_FLAG_RELIABLE_CPU_ID

----- On Jul 7, 2020, at 3:29 AM, Florian Weimer fw@...eb.enyo.de wrote:

> * Mathieu Desnoyers:
> 
>> commit 93b585c08d16 ("Fix: sched: unreliable rseq cpu_id for new tasks")
>> addresses an issue with cpu_id field of newly created processes. Expose
>> a flag which can be used by user-space to query whether the kernel
>> implements this fix.
>>
>> Considering that this issue can cause corruption of user-space per-cpu
>> data updated with rseq, it is recommended that user-space detects
>> availability of this fix by using the RSEQ_FLAG_RELIABLE_CPU_ID flag
>> either combined with registration or on its own before using rseq.
> 
> Presumably, the intent is that glibc uses RSEQ_FLAG_RELIABLE_CPU_ID to
> register the rseq area.  That will surely prevent glibc itself from
> activating rseq on broken kernels.  But if another rseq library
> performs registration and has not been updated to use
> RSEQ_FLAG_RELIABLE_CPU_ID, we still end up with an active rseq area
> (and incorrect CPU IDs from sched_getcpu in glibc).  So further glibc
> changes will be needed.  I suppose we could block third-party rseq
> registration with a registration of a hidden rseq area (not
> __rseq_abi).  But then the question is if any of the third-party rseq
> users are expecting the EINVAL error code from their failed
> registration.
> 
> The rseq registration state machine is quite tricky already, and the
> need to use RSEQ_FLAG_RELIABLE_CPU_ID would make it even more
> complicated.  Even if we implemented all the changes, it's all going
> to be essentially dead, untestable code in a few months, when the
> broken kernels are out of circulation.  It does not appear to be good
> investment to me.

Those are very good points. One possibility we have would be to let
glibc do the rseq registration without the RSEQ_FLAG_RELIABLE_CPU_ID
flag. On kernels with the bug present, the cpu_id field is still good
enough for typical uses of sched_getcpu() which does not appear to
have a very strict correctness requirement on returning the right
cpu number.

Then libraries and applications which require a reliable cpu_id field
could check this on their own by calling rseq with the
RSEQ_FLAG_RELIABLE_CPU_ID flag. This would not make the state more
complex in __rseq_abi, and let each rseq user decide about its own fate:
whether it uses rseq or keeps using an rseq-free fallback.

I am still tempted to allow combining RSEQ_FLAG_REGISTER | RSEQ_FLAG_RELIABLE_CPU_ID
for applications which would not be using glibc, and want to check this flag on
thread registration.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com