[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <13216fb113d64a508b7908b500612335@codeaurora.org>
Date: Sat, 01 Oct 2016 11:45:18 -0400
From: bdegraaf@...eaurora.org
To: Robin Murphy <robin.murphy@....com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will.deacon@....com>,
Timur Tabi <timur@...eaurora.org>,
Nathan Lynch <nathan_lynch@...tor.com>,
linux-kernel@...r.kernel.org,
Christopher Covington <cov@...eaurora.org>,
linux-arm-kernel@...ts.infradead.org
Subject: Re: [RFC] arm64: Enforce observed order for spinlock and data
On 2016-09-30 14:43, Robin Murphy wrote:
>> + * so LSE needs an explicit barrier here as well. Without this, the
>> + * changed contents of the area protected by the spinlock could be
>> + * observed prior to the lock.
>
> What is that last sentence supposed to mean? If the lock is free, then
> surely any previous writes to the data it's protecting would have
> already been observed by the release semantics of the previous unlock?
> If the lock is currently held, what do we care about the state of the
> data while we're still spinning on the lock itself? And if someone's
> touching the data without having acquired *or* released the lock, why
> is
> there a lock in the first place?
>
> This seems like a very expensive way of papering over broken callers :/
>
> Robin.
>
Thanks for your question.
First off, I saw no negative impact to performance as a result of
introducing
these barriers running a wide variety of use cases, both for mobile and
server-class devices ranging from 4 to 24 cpus.
Yes, it does protect lockref, which observes the spinlocks in a
non-conventional
way. In fact, with this code in place, the performance of Linus' test
which runs
stat like crazy actually improved in the range of 1-2% (I suspect this
is due to
fewer failures due to contention on the protected count lockref uses).
The lockref code can be made compliant, but not by a single
load-acquire--it has
to take the lock. Turning off CONFIG_ARCH_USE_CMPXCHG_LOCKREF is the
most
obvious solution as it forces lockref.c to take the lock. That,
however, comes
at a very high performance cost: 30-50% on Linus' stat test on a 24-core
system.
For larger systems, this performance gap will get even worse.
With the above in mind, it seems that supporting lockref's unorthodox
method of
dealing with the lock is the better alternative, as it helps, rather
than hurts,
arm64 performance.
Brent
Powered by blists - more mailing lists