[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <22831be0d0e558768007ddc7a1e90fdd@codeaurora.org>
Date: Fri, 28 Jul 2017 12:11:35 -0700
From: Vikram Mulukutla <markivx@...eaurora.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: qiaozhou <qiaozhou@...micro.com>,
Thomas Gleixner <tglx@...utronix.de>,
John Stultz <john.stultz@...aro.org>, sboyd@...eaurora.org,
LKML <linux-kernel@...r.kernel.org>,
Wang Wilbur <wilburwang@...micro.com>,
Marc Zyngier <marc.zyngier@....com>,
Will Deacon <will.deacon@....com>,
linux-kernel-owner@...r.kernel.org, sudeep.holla@....com
Subject: Re: [Question]: try to fix contention between expire_timers and
try_to_del_timer_sync
On 2017-07-28 02:28, Peter Zijlstra wrote:
> On Thu, Jul 27, 2017 at 06:10:34PM -0700, Vikram Mulukutla wrote:
>
>> I think we should have this discussion now - I brought this up earlier
>> [1]
>> and I promised a test case that I completely forgot about - but here
>> it
>> is (attached). Essentially a Big CPU in an acquire-check-release loop
>> will have an unfair advantage over a little CPU concurrently
>> attempting
>> to acquire the same lock, in spite of the ticket implementation. If
>> the Big
>> CPU needs the little CPU to make forward progress : livelock.
>
> This needs to be fixed in hardware. There really isn't anything the
> software can sanely do about it.
>
> It also doesn't have anything to do with the spinlock implementation.
> Ticket or not, its a fundamental problem of LL/SC. Any situation where
> we use atomics for fwd progress guarantees this can happen.
>
Agreed, it seems like trying to build a fair SW protocol over unfair HW.
But if we can minimally change such loop constructs to address this (all
instances I've seen so far use cpu_relax) it would save a lot of hours
spent debugging these problems. Lot of b.L devices out there :-)
It's also possible that such a workaround may help contention
performance
since the big CPU may have to wait for say a tick before breaking out of
that loop (the non-livelock scenario where the entire loop isn't in a
critical section).
> The little core (or really any core) should hold on to the locked
> cacheline for a while and not insta relinquish it. Giving it a chance
> to
> reach the SC.
Thanks,
Vikram
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
Powered by blists - more mailing lists