[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALCETrWgXcWrfDhgjzeJ8e5kn4ukFLvC6steTp7GUzFJNYgVyw@mail.gmail.com>
Date: Thu, 11 Aug 2016 00:23:21 -0700
From: Andy Lutomirski <luto@...capital.net>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: Ben Maurer <bmaurer@...com>, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
Russell King <linux@....linux.org.uk>,
linux-api <linux-api@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Michael Kerrisk <mtk.manpages@...il.com>,
Dave Watson <davejwatson@...com>,
rostedt <rostedt@...dmis.org>, Will Deacon <will.deacon@....com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
linux-kernel <linux-kernel@...r.kernel.org>,
Chris Lameter <cl@...ux.com>, Andi Kleen <andi@...stfloor.org>,
Josh Triplett <josh@...htriplett.org>,
Paul Turner <pjt@...gle.com>,
Boqun Feng <boqun.feng@...il.com>,
Catalin Marinas <catalin.marinas@....com>,
Andrew Hunter <ahh@...gle.com>,
"H. Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: [RFC PATCH v7 1/7] Restartable sequences system call
On Aug 11, 2016 12:01 AM, "Mathieu Desnoyers"
<mathieu.desnoyers@...icios.com> wrote:
>
> ----- On Aug 10, 2016, at 4:09 PM, Andy Lutomirski luto@...capital.net wrote:
>
> > On Wed, Aug 10, 2016 at 1:06 PM, Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:
>
> <snip>
>
> >>> u64 is a perfectly valid, if odd, userspace pointer on all
> >>> architecures that I know of, and it's certainly a valid userspace
> >>> pointer on x86 32-bit userspace (the high bits will just all be zero).
> >>> Can you just use u64?
> >>
> >> My concern is about a 32-bit user-space putting garbage rather than zeroes
> >> (on purpose) to fool the kernel on those upper 32 bits. Doing
> >>
> >> compat_ptr((compat_uptr_t)rseq_cs.start_ip)
> >>
> >> effectively ends up clearing the upper 32 bits.
> >>
> >> But since we only use those pointer values for comparisons, perhaps we
> >> just don't care if a 32-bit userspace app try to shoot itself in
> >> the foot by passing garbage upper 32 bits ?
> >>
> >
> > How is garbage in the high bits any different than garbage in any
> > other bits in there?
>
> It's not :)
>
> >
> >>
> >>> If this would be a performance problem on ARM, then maybe that's a
> >>> reason to use compat helpers.
> >>
> >> We already use 64-bit values for the pointers, even on 32-bit. Normally
> >> userspace just puts zeroes in the top bits. It's mostly a question of
> >> clearing the top 32 bits or not when loading them in the kernel. If we
> >> don't need to, then I can remove the compat code entirely, and we don't
> >> care about user_64bit_mode() anymore, as you initially recommended.
> >> Does it make sense ?
> >
> > Yes, I think so. I'd suggest just honoring all the bits.
>
> OK, will do !
>
> >
> >>
> >>>
> >>>>
> >>>>>
> >>>>>
> >>>>>>>> +SYSCALL_DEFINE2(rseq, struct rseq __user *, rseq, int, flags)
> >>>>>>>> +{
> >>>>>>>> + if (unlikely(flags))
> >>>>>>>> + return -EINVAL;
> >>>>>>>
> >>>>>>> (add whitespace)
> >>>>>>
> >>>>>> fixed.
> >>>>>>
> >>>>>>>
> >>>>>>>> + if (!rseq) {
> >>>>>>>> + if (!current->rseq)
> >>>>>>>> + return -ENOENT;
> >>>>>>>> + return 0;
> >>>>>>>> + }
> >>>>>
> >>>>> This looks entirely wrong. Setting rseq to NULL fails if it's already
> >>>>> NULL but silently does nothing if rseq is already set? Surely it
> >>>>> should always succeed and it should actually do something if rseq is
> >>>>> set.
> >>>>
> >>>> From the proposed rseq(2) manpage:
> >>>>
> >>>> "A NULL rseq value can be used to check whether rseq is registered
> >>>> for the current thread."
> >>>>
> >>>> The implementation does just that: it returns -1, errno=ENOENT if no
> >>>> rseq is currently registered, or 0 if rseq is currently registered.
> >>>
> >>> I think that's problematic. Why can't you unregister an existing
> >>> rseq? If you can't, how is a thread supposed to clean up after
> >>> itself?
> >>>
> >>
> >> Unregistering an existing thread rseq would require that we keep reference
> >> counting, in case multiple libs and/or the app are using rseq. I am
> >> trying to keep things as simple as needed.
> >>
> >> If I understand your concern, the problematic scenario would be at
> >> thread exit (this is my current approximate understanding of glibc
> >> handling of library TLS variable reclaim at thread exit):
> >>
> >> thread exits in userspace:
> >> - glibc frees its rseq TLS memory area (in case the TLS is in a library),
> >> - thread preempted before really exiting,
> >> - kernel reads/writes to freed TLS memory.
> >> - corruption may occur (e.g. memory re-allocated by another thread already)
> >>
> >> Am I getting it right ?
> >
> > Yes.
>
> Hrm, then we should:
>
> - add a rseq_refcount field to the task struct,
> - increment this refcount whenever rseq receives a registration, after
> ensuring that we are registering the same address as was previously
> requested by preceding registrations for the thread (except if the
> refcount was 0),
> - When rseq receives a NULL address, decrement refcount. Set address to
> NULL when it reaches 0.
>
> Doing the refcounting in kernel-space rather than user-space allows us to
> keep both registration/unregistration and refcount atomic, which simplify
> things if we plan to use rseq from signal handlers.
>
> With current glibc, a library that would lazily register and use rseq
> without knowledge of the application would then have to use pthread_key_create()
> to set a destr_function to run at thread exit, which would take care of
> unregistration.
That sounds reasonable at first glance.
>
> We could add a RSEQ_FORCE_UNREGISTER flag to rseq flags to allow future
> glibc versions to force unregistering rseq before freeing its TLS memory,
> just in case a userspace library omits to unregister itself.
Sounds good too.
Powered by blists - more mailing lists