[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150826181659.GW16853@twins.programming.kicks-ass.net>
Date: Wed, 26 Aug 2015 20:16:59 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Oleg Nesterov <oleg@...hat.com>,
Paul McKenney <paulmck@...ux.vnet.ibm.com>,
Ingo Molnar <mingo@...nel.org>, mtk.manpages@...il.com,
dvhart@...radead.org, dave@...olabs.net,
Vineet.Gupta1@...opsys.com, ralf@...ux-mips.org,
ddaney@...iumnetworks.com, Will Deacon <will.deacon@....com>,
linux-kernel@...r.kernel.org
Subject: futex atomic vs ordering constraints
Hi all,
I tried to keep this email short, but failed miserably at this. For
the TL;DR skip to the tail.
So the question of ordering constraints of futex atomic operations has
come up recently:
http://marc.info/?l=linux-kernel&m=143894765931868
This email will attempt to describe the two primitives and start a
discussion on the constraints.
* futex_atomic_op_inuser()
There is but a single callsite of this function: futex_wake_op().
It being part of a wake primitive seems to suggest a (RCsc) RELEASE is
the strongest required (the RCsc part because I don't think we want to
expose RCpc to userspace if we don't have to).
The immediate scenario where this is important is:
CPU0 CPU1 CPU2
futex_lock(); -> uncontended user acquire
A = 1;
futex_lock(); -> kernel, set pending, sleep
B = 1;
futex_unlock();
if pending
<kernel>
futex_wake_op
spin_lock(bh->lock)
RELEASE
futex_atomic_op_inuser(); -> futex unlocked
futex_lock() -> uncontended user steal
load A;
In other words, the moment we perform the WAKE_OP userspace can observe
the 'lock' as unlocked and do a lock (steal) acquire of the 'lock'.
If userspace succeeds with this acquire, we need full serialization of
the locked (RCsc) variables (eg A and B in the above).
Of course, if anything else prior to futex_atomic_op_inuser() implies an
(RCsc) RELEASE or stronger the primitive can do without providing
anything itself.
This turns out to be the case, a successful get_futex_key() implies a
full memory barrier; recent: 1d0dcb3ad9d3 ("futex: Implement lockless
wakeups").
And since get_futex_key() is fundamental to doing _anything_ with a
futex, I think its semi-sane to rely on this.
So we have two valid options:
- RCsc RELEASE
- no ordering at all
Current implementation:
alpha: MB ll/sc RELEASE
arm64: ll/sc-release MB FULL
arm: MB ll/sc RELEASE
mips: ll/sc MB ACQUIRE
powerpc: lwsync ll/sc sync FULL
* futex_atomic_cmpxchg_inatomic()
This is called from:
lock_pi_update_atomic
wake_futex_pi
fixup_pi_state_owner
futex_unlock_pi
handle_futex_death
But I think we can form a position from just two of them:
futex_unlock_pi() and lock_pi_update_atomic()
these end up being ACQUIRE and RELEASE, and a combination of these two
would give us a requirement for full serialization.
And unlike the previous we cannot talk this one away. Even though every
futex op needs a get_futex_key() which implies a full memory barrier,
and every get_futex_key() needs a put_futex_key(), the latter does _NOT_
imply a full barrier.
So while we could relax the RELEASE semantics we cannot relax the
ACQUIRE semantics.
Then there is handle_futex_death(), which is difficult, I _think_ it
wants to be a RELEASE, but state is corrupted anyhow and I can well
imagine not wanting to play any games here and go fully serialized like
we're used to with cmpxchg.
Now the robust stuff doesn't use {get,put}_futex_key() stuff, so no
implied barriers here.
Which leaves us all with a great big mess.
Current implementation:
alpha: MB ll/sc RELEASE
arm64: ll/sc-release MB FULL
arm: MB ll/sc MB FULL
mips: ll/sc MB ACQUIRE
powerpc: lwsync ll/sc sync FULL
There are a few options:
1) punt, mandate they're both fully ordered and stop thinking about it
2) make them both fully relaxed, rely on implied barriers and employ
smp_mb__{before,after}_atomic in key places
Given the current state of things and that I don't really think there is
a compelling performance argument to be made for 2, I would suggest we
go with 1.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists