[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150428182410.GM5029@twins.programming.kicks-ass.net>
Date: Tue, 28 Apr 2015 20:24:10 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Chris Metcalf <cmetcalf@...hip.com>
Cc: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Manfred Spraul <manfred@...orfullife.com>,
Oleg Nesterov <oleg@...hat.com>,
Kirill Tkhai <ktkhai@...allels.com>,
linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH 2/2] [PATCH] sched: Add smp_rmb() in task rq locking
cycles
On Tue, Apr 28, 2015 at 02:00:48PM -0400, Chris Metcalf wrote:
> Yes, tilepro can do 16-bit atomic load/stores. The reason we didn't use
> your approach (basically having tns provide locking for the head/tail)
> is just a perceived efficiency gain from rolling the tns lock into the head.
>
> The current tilepro arch_spin_lock() is just three mesh network transactions
> (tns, store, load). Your proposed spin lock is five (tns, load, store,
> store, load).
> Or, looking it from a core-centric perspective, the current arch_spin_lock()
> only has to wait on requests from the mesh network twice (tns, load),
> basically
> once for each member of the lock structure; your proposed version is three
> (tns, load, load).
>
> I don't honestly know how critical this difference is, but that's why I
> designed it the way I did.
Makes sense. Good reason ;-)
> I think your goal with your proposed redesign is being able to atomically
> read head and tail together for arch_spin_unlock_wait(), but I don't see
> why that's better than just reading head, checking it's not equal to tail
> with a separate read, then spinning waiting for head to change.
Right, that should be perfectly fine indeed.
A few questions:
> >static inline void arch_spin_lock(arch_spinlock_t *lock)
> >{
> > unsigned short head, tail;
> >
> > ___tns_lock(&lock->lock); /* XXX does the TNS imply a ___sync? */
Does it? Something needs to provide the ACQUIRE semantics.
> > head = lock->head;
> > lock->head++;
> > ___tns_unlock(&lock->lock);
> >
> > while (READ_ONCE(lock->tail) != head)
> > cpu_relax();
> >}
> >
> >static inline void arch_spin_unlock(arch_spinlock_t *lock)
> >{
> > /*
> > * can do with regular load/store because the lock owner
> > * is the only one going to do stores to the tail
> > */
> > unsigned short tail = READ_ONCE(lock->tail);
> > smp_mb(); /* MB is stronger than RELEASE */
Note that your code uses wmb(), wmb is strictly speaking not correct,
as its weaker than RELEASE.
_However_ it doesn't make any practical difference since all three
barriers end up emitting __sync() so its not a bug per se.
> > WRITE_ONCE(lock->tail, tail + 1);
> >}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists