[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150730153452.GG27280@linux.vnet.ibm.com>
Date: Thu, 30 Jul 2015 08:34:52 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, mingo@...nel.org,
jiangshanlai@...il.com, dipankar@...ibm.com,
akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
josh@...htriplett.org, tglx@...utronix.de, rostedt@...dmis.org,
dhowells@...hat.com, edumazet@...gle.com, dvhart@...ux.intel.com,
fweisbec@...il.com, oleg@...hat.com, bobby.prani@...il.com,
dave@...olabs.net, waiman.long@...com
Subject: Re: [PATCH tip/core/rcu 19/19] rcu: Add fastpath bypassing funnel
locking
On Thu, Jul 30, 2015 at 04:44:55PM +0200, Peter Zijlstra wrote:
> On Fri, Jul 17, 2015 at 04:29:24PM -0700, Paul E. McKenney wrote:
>
> > /*
> > + * First try directly acquiring the root lock in order to reduce
> > + * latency in the common case where expedited grace periods are
> > + * rare. We check mutex_is_locked() to avoid pathological levels of
> > + * memory contention on ->exp_funnel_mutex in the heavy-load case.
> > + */
> > + rnp0 = rcu_get_root(rsp);
> > + if (!mutex_is_locked(&rnp0->exp_funnel_mutex)) {
> > + if (mutex_trylock(&rnp0->exp_funnel_mutex)) {
> > + if (sync_exp_work_done(rsp, rnp0, NULL,
> > + &rsp->expedited_workdone0, s))
> > + return NULL;
> > + return rnp0;
> > + }
> > + }
>
> So our 'new' locking primitives do things like:
>
> static __always_inline int queued_spin_trylock(struct qspinlock *lock)
> {
> if (!atomic_read(&lock->val) &&
> (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) == 0))
> return 1;
> return 0;
> }
>
> mutexes do not do this.
>
> Now I suppose the question is, does that extra read slow down the
> (common) uncontended case? (remember, we should optimize locks for the
> uncontended case, heavy lock contention should be fixed with better
> locking schemes, not lock implementations).
>
> Davidlohr, Waiman, do we have data on this?
>
> If the extra read before the cmpxchg() does not hurt, we should do the
> same for mutex and make the above redundant.
I am pretty sure that different hardware wants it done differently. :-/
So I agree that hard data would be good.
I could probably further optimize the RCU code by checking for a
single-node tree, but I am not convinced that this is worthwhile.
However, skipping three cache misses in the uncontended case is
definitely worthwhile, hence this patch. ;-)
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists