lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1611252007230.3602@nanos>
Date:   Fri, 25 Nov 2016 20:13:24 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Peter Zijlstra <peterz@...radead.org>
cc:     mingo@...nel.org, juri.lelli@....com, rostedt@...dmis.org,
        xlpang@...hat.com, bigeasy@...utronix.de,
        linux-kernel@...r.kernel.org, mathieu.desnoyers@...icios.com,
        jdesfossez@...icios.com, bristot@...hat.com
Subject: Re: [RFC][PATCH 4/4] futex: Rewrite FUTEX_UNLOCK_PI

On Fri, 25 Nov 2016, Peter Zijlstra wrote:
> On Fri, Nov 25, 2016 at 10:23:26AM +0100, Peter Zijlstra wrote:
> > On Thu, Nov 24, 2016 at 07:58:07PM +0100, Peter Zijlstra wrote:
> > 
> > > OK, so clearly I'm confused. So let me try again.
> > > 
> > > LOCK_PI, does in one function: lookup_pi_state, and fixup_owner. If
> > > fixup_owner fails with -EAGAIN, we can redo the pi_state lookup.
> > > 
> > > The requeue stuff, otoh, has one each. REQUEUE_WAIT has fixup_owner(),
> > > CMP_REQUEUE has lookup_pi_state. Therefore, fixup_owner failing with
> > > -EAGAIN leaves us dead in the water. There's nothing to go back to to
> > > retry.
> > > 
> > > So far, so 'good', right?
> > > 
> > > Now, as far as I understand this requeue stuff, we have 2 futexes, an
> > > inner futex and an outer futex. The inner futex is always 'locked' and
> > > serves as a collection pool for waiting threads.

Yes.
 
> > > The requeue crap picks one (or more) waiters from the inner futex and
> > > sticks them on the outer futex, which gives them a chance to run.

No. The chance to ran can get only one, if it can acquire the futex in user
space directly. If the futex is locked, it is queued on the outer
futex/rtmutex and gets to run when the outer futex is unlocked by the owner.
 
> > > So WAIT_REQUEUE blocks on the inner futex, but knows that if it ever
> > > gets woken, it will be on the outer futex, and hence needs to
> > > fixup_owner if the futex and rt_mutex state got out of sync.

It can be woken on the inner one as well (signal/timeout), but that does
not affect the outer futex, so it's not interesting.

> > > CMP_REQUEUEUEUE picks the one (or more) waiters of the inner futex and
> > > sticks them on the outer futex.
> > > 
> > > So far, so 'good' ?
> > > 
> > > The thing I'm not entire sure on is what happens with the outer futex,
> > > do we first LOCK_PI it before doing CMP_REQUEUE, giving us waiters, and
> > > then UNLOCK_PI to let them rip? Or do we just CMP_REQUEUE and then let
> > > whoever wins finish with UNLOCK_PI?
> > > 
> > > 
> > > In any case, I don't think it matters much, either way we can race
> > > betwen the 'last' UNLOCK_PI and getting rt_mutex waiters and then hit
> > > the &init_task funny state, such that WAIT_REQUEUE waking hits EAGAIN
> > > and we're 'stuck'.

Right, that would be exceptionally bad.
 
> > > Now, if we always CMP_REQUEUE to a locked outer futex, then we cannot
> > > know, at CMP_REQUEUE time, who will win and cannot fix up.
> > 
> > OTOH, if we always first LOCK_PI before doing CMP_REQUEUE, I don't think
> > we can hit the funny state, LOCK_PI will have fixed it up for us.
> > 
> > So the question is, do we mandate LOCK_PI before CMP_REQUEUE?
> 
> Going by futex_requeue(), the first thing it does; after validation and
> getting hbs locked, is futex_proxy_trylock_atomic(), which per the
> comment above it will attempt to acquire uaddr2.
> 
> So no such mandate, otherwise that op would not exist and we'd only need
> to validate that uaddr2 was 'current'.

Correct. It can be done with uaddr2 locked or not. That's why we try to
take it directly first.

Thanks,

	tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ