[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e1cc02c5e7dfd4d6bec937b6dc97bfc7@codeaurora.org>
Date: Thu, 27 Jul 2017 18:10:34 -0700
From: Vikram Mulukutla <markivx@...eaurora.org>
To: qiaozhou <qiaozhou@...micro.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
John Stultz <john.stultz@...aro.org>, sboyd@...eaurora.org,
LKML <linux-kernel@...r.kernel.org>,
Wang Wilbur <wilburwang@...micro.com>,
Marc Zyngier <marc.zyngier@....com>,
Will Deacon <will.deacon@....com>,
Peter Zijlstra <peterz@...radead.org>,
linux-kernel-owner@...r.kernel.org, sudeep.holla@....com
Subject: Re: [Question]: try to fix contention between expire_timers and
try_to_del_timer_sync
cc: Sudeep Holla
On 2017-07-26 18:29, qiaozhou wrote:
> On 2017年07月26日 22:16, Thomas Gleixner wrote:
>> On Wed, 26 Jul 2017, qiaozhou wrote:
>>
>> Cc'ed ARM folks.
>>
<snip>
>>
>> For that particular timer case we can clear base->running_timer w/o
>> the
>> lock held (see patch below), but this kind of
>>
>> lock -> test -> unlock -> retry
>>
>> loops are all over the place in the kernel, so this is going to hurt
>> you
>> sooner than later in some other place.
> It's true. This is the way spinlock is used normally and widely in
> kernel. I'll also ask ARM experts whether we can do something to avoid
> or reduce the chance of such issue. ARMv8.1 has one single
> instruction(ldadda) to replace the ldaxr/stxr loop. Hope it can
> improve and reduce the chance.
I think we should have this discussion now - I brought this up earlier
[1]
and I promised a test case that I completely forgot about - but here it
is (attached). Essentially a Big CPU in an acquire-check-release loop
will have an unfair advantage over a little CPU concurrently attempting
to acquire the same lock, in spite of the ticket implementation. If the
Big
CPU needs the little CPU to make forward progress : livelock.
We've run into the same loop construct in other spots in the kernel and
the reason that a real symptom is so rare is that the retry-loop on the
'Big'
CPU needs to be interrupted just once by say an IRQ/FIQ and the
live-lock
is broken. If the entire retry loop is within an interrupt-disabled
critical
section then the odds of live-locking are much higher.
An example of the problem on a previous kernel is here [2]. Changes to
the
workqueue code since may have fixed this particular instance.
One solution was to use udelay(1) in such loops instead of cpu_relax(),
but
that's not very 'relaxing'. I'm not sure if there's something we could
do
within the ticket spin-lock implementation to deal with this.
Note that I ran my test on a 4.9 kernel so that didn't include any
spinlock
implementation changes since then. The test schedules two threads, one
on
a big CPU and one on a little CPU. The big CPU thread does the
lock/unlock/retry
loop for a full 1 second with interrupts disabled, while the little CPU
attempts
to acquire the same loop but enabling interrupts after every successful
lock+unlock.
With unfairness, the little CPU may take upto 1 second (or several
milliseconds at
the least) just to acquire the lock once. This varies depending on the
IPC difference
and frequencies of the big and little ARM64 CPUs:
Big cpu frequency | Little cpu frequency | Max time taken by little to
acquire lock
2GHz | 1.5GHz | 133 microseconds
2GHz | 300MHz | 734 milliseconds
Thanks,
Vikram
[1] - https://lkml.org/lkml/2016/11/17/934
[2] - https://goo.gl/uneFjt
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
View attachment "0001-measure-spinlock-fairness-across-differently-capable.patch" of type "text/x-diff" (7285 bytes)
Powered by blists - more mailing lists