[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1327322278.7807.1470862882633.JavaMail.zimbra@efficios.com>
Date: Wed, 10 Aug 2016 21:01:22 +0000 (UTC)
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Andy Lutomirski <luto@...capital.net>
Cc: Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Russell King <linux@....linux.org.uk>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>,
linux-kernel <linux-kernel@...r.kernel.org>,
linux-api <linux-api@...r.kernel.org>,
Paul Turner <pjt@...gle.com>, Andrew Hunter <ahh@...gle.com>,
Andi Kleen <andi@...stfloor.org>,
Dave Watson <davejwatson@...com>, Chris Lameter <cl@...ux.com>,
Ben Maurer <bmaurer@...com>, rostedt <rostedt@...dmis.org>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Josh Triplett <josh@...htriplett.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will.deacon@....com>,
Michael Kerrisk <mtk.manpages@...il.com>,
Boqun Feng <boqun.feng@...il.com>
Subject: Re: [RFC PATCH v7 1/7] Restartable sequences system call
----- On Aug 10, 2016, at 4:09 PM, Andy Lutomirski luto@...capital.net wrote:
> On Wed, Aug 10, 2016 at 1:06 PM, Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:
<snip>
>>> u64 is a perfectly valid, if odd, userspace pointer on all
>>> architecures that I know of, and it's certainly a valid userspace
>>> pointer on x86 32-bit userspace (the high bits will just all be zero).
>>> Can you just use u64?
>>
>> My concern is about a 32-bit user-space putting garbage rather than zeroes
>> (on purpose) to fool the kernel on those upper 32 bits. Doing
>>
>> compat_ptr((compat_uptr_t)rseq_cs.start_ip)
>>
>> effectively ends up clearing the upper 32 bits.
>>
>> But since we only use those pointer values for comparisons, perhaps we
>> just don't care if a 32-bit userspace app try to shoot itself in
>> the foot by passing garbage upper 32 bits ?
>>
>
> How is garbage in the high bits any different than garbage in any
> other bits in there?
It's not :)
>
>>
>>> If this would be a performance problem on ARM, then maybe that's a
>>> reason to use compat helpers.
>>
>> We already use 64-bit values for the pointers, even on 32-bit. Normally
>> userspace just puts zeroes in the top bits. It's mostly a question of
>> clearing the top 32 bits or not when loading them in the kernel. If we
>> don't need to, then I can remove the compat code entirely, and we don't
>> care about user_64bit_mode() anymore, as you initially recommended.
>> Does it make sense ?
>
> Yes, I think so. I'd suggest just honoring all the bits.
OK, will do !
>
>>
>>>
>>>>
>>>>>
>>>>>
>>>>>>>> +SYSCALL_DEFINE2(rseq, struct rseq __user *, rseq, int, flags)
>>>>>>>> +{
>>>>>>>> + if (unlikely(flags))
>>>>>>>> + return -EINVAL;
>>>>>>>
>>>>>>> (add whitespace)
>>>>>>
>>>>>> fixed.
>>>>>>
>>>>>>>
>>>>>>>> + if (!rseq) {
>>>>>>>> + if (!current->rseq)
>>>>>>>> + return -ENOENT;
>>>>>>>> + return 0;
>>>>>>>> + }
>>>>>
>>>>> This looks entirely wrong. Setting rseq to NULL fails if it's already
>>>>> NULL but silently does nothing if rseq is already set? Surely it
>>>>> should always succeed and it should actually do something if rseq is
>>>>> set.
>>>>
>>>> From the proposed rseq(2) manpage:
>>>>
>>>> "A NULL rseq value can be used to check whether rseq is registered
>>>> for the current thread."
>>>>
>>>> The implementation does just that: it returns -1, errno=ENOENT if no
>>>> rseq is currently registered, or 0 if rseq is currently registered.
>>>
>>> I think that's problematic. Why can't you unregister an existing
>>> rseq? If you can't, how is a thread supposed to clean up after
>>> itself?
>>>
>>
>> Unregistering an existing thread rseq would require that we keep reference
>> counting, in case multiple libs and/or the app are using rseq. I am
>> trying to keep things as simple as needed.
>>
>> If I understand your concern, the problematic scenario would be at
>> thread exit (this is my current approximate understanding of glibc
>> handling of library TLS variable reclaim at thread exit):
>>
>> thread exits in userspace:
>> - glibc frees its rseq TLS memory area (in case the TLS is in a library),
>> - thread preempted before really exiting,
>> - kernel reads/writes to freed TLS memory.
>> - corruption may occur (e.g. memory re-allocated by another thread already)
>>
>> Am I getting it right ?
>
> Yes.
Hrm, then we should:
- add a rseq_refcount field to the task struct,
- increment this refcount whenever rseq receives a registration, after
ensuring that we are registering the same address as was previously
requested by preceding registrations for the thread (except if the
refcount was 0),
- When rseq receives a NULL address, decrement refcount. Set address to
NULL when it reaches 0.
Doing the refcounting in kernel-space rather than user-space allows us to
keep both registration/unregistration and refcount atomic, which simplify
things if we plan to use rseq from signal handlers.
With current glibc, a library that would lazily register and use rseq
without knowledge of the application would then have to use pthread_key_create()
to set a destr_function to run at thread exit, which would take care of
unregistration.
We could add a RSEQ_FORCE_UNREGISTER flag to rseq flags to allow future
glibc versions to force unregistering rseq before freeing its TLS memory,
just in case a userspace library omits to unregister itself.
Thoughts ?
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Powered by blists - more mailing lists