[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <57BCA5B1.1010401@hpe.com>
Date: Tue, 23 Aug 2016 15:36:17 -0400
From: Waiman Long <waiman.long@....com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Jason Low <jason.low2@....com>,
Davidlohr Bueso <dave@...olabs.net>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Ding Tianhong <dingtianhong@...wei.com>,
Thomas Gleixner <tglx@...utronix.de>,
Will Deacon <Will.Deacon@....com>,
Ingo Molnar <mingo@...hat.com>,
Imre Deak <imre.deak@...el.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Tim Chen <tim.c.chen@...ux.intel.com>,
"Paul E. McKenney" <paulmck@...ibm.com>, <jason.low2@...com>
Subject: Re: [RFC][PATCH 0/3] locking/mutex: Rewrite basic mutex
On 08/23/2016 12:57 PM, Peter Zijlstra wrote:
> On Tue, Aug 23, 2016 at 09:35:03AM -0700, Jason Low wrote:
>> On Tue, 2016-08-23 at 09:17 -0700, Davidlohr Bueso wrote:
>>> What's the motivation here? Is it just to unify counter and owner for
>>> the starvation issue? If so, is this really the path we wanna take for
>>> a small debug corner case?
>> And we thought our other patch was a bit invasive :-)
> So I've wanted to do something like this for a while now, and Linus
> saying he wanted to always enable the spinning and basically reduce
> special cases made me bite the bullet and just do it to see what it
> would look like.
>
> So it not only unifies counter and owner for the starvation case, it
> does so to allow spinning and debug as well as lock handoff.
> It collapses the whole count+owner+yield_to_owner into a single
> variable.
>
> It obviously is a tad invasive, but it does make things more similar to
> rt-mutex and pi futex, both of which track the owner and pending in the
> primary 'word'.
>
> That said, I don't particularly like the new mutex_unlock() code, its
> rather more heavy than I would like, although typically the word is
> uncontended at unlock and we'd only need a single go at the
> cmpxchg-loop.
>
>
I think this is the right way to go. There isn't any big change in the
slowpath, so the contended performance should be the same. The fastpath,
however, will get a bit slower as a single atomic op plus a jump
instruction (a single cacheline load) is replaced by a read-and-test and
compxchg (potentially 2 cacheline loads) which will be somewhat slower
than the optimized assembly code. Alternatively, you can replace the
__mutex_trylock() in mutex_lock() by just a blind cmpxchg to optimize
the fastpath further. A cmpxhcg will still be a tiny bit slower than
other atomic ops, but it will be more acceptable, I think.
BTW, I got the following compilation warning when I tried your patch:
drivers/gpu/drm/i915/i915_gem_shrinker.c: In function ‘mutex_is_locked_by’:
drivers/gpu/drm/i915/i915_gem_shrinker.c:44:22: error: invalid operands
to binary == (have ‘atomic_long_t’ and ‘struct task_struct *’)
return mutex->owner == task;
^
CC [M] drivers/gpu/drm/i915/intel_psr.o
drivers/gpu/drm/i915/i915_gem_shrinker.c:49:1: warning: control reaches
end of non-void function [-Wreturn-type]
}
^
make[4]: *** [drivers/gpu/drm/i915/i915_gem_shrinker.o] Error 1
Apparently, you may need to look to see if there are other direct access
of the owner field in the other code.
Cheers,
Longman
Powered by blists - more mailing lists