[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54f5bdcf-8b6e-0e53-8f28-5d5c3eb5f7ad@redhat.com>
Date: Thu, 18 Nov 2021 13:12:03 -0500
From: Waiman Long <longman@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>,
Muchun Song <songmuchun@...edance.com>
Cc: mingo@...hat.com, will@...nel.org, boqun.feng@...il.com,
linux-kernel@...r.kernel.org, duanxiongchun@...edance.com,
zhengqi.arch@...edance.com
Subject: Re: [PATCH] locking/rwsem: Optimize down_read_trylock() under highly
contended case
On 11/18/21 07:57, Peter Zijlstra wrote:
> On Thu, Nov 18, 2021 at 05:44:55PM +0800, Muchun Song wrote:
>
>> By using the above benchmark, the real executing time on a x86-64 system
>> before and after the patch were:
> What kind of x86_64 ?
>
>> Before Patch After Patch
>> # of Threads real real reduced by
>> ------------ ------ ------ ----------
>> 1 65,373 65,206 ~0.0%
>> 4 15,467 15,378 ~0.5%
>> 40 6,214 5,528 ~11.0%
>>
>> For the uncontended case, the new down_read_trylock() is the same as
>> before. For the contended cases, the new down_read_trylock() is faster
>> than before. The more contended, the more fast.
>>
>> Signed-off-by: Muchun Song <songmuchun@...edance.com>
>> ---
>> kernel/locking/rwsem.c | 11 ++++-------
>> 1 file changed, 4 insertions(+), 7 deletions(-)
>>
>> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
>> index c51387a43265..ef2b2a3f508c 100644
>> --- a/kernel/locking/rwsem.c
>> +++ b/kernel/locking/rwsem.c
>> @@ -1249,17 +1249,14 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
>>
>> DEBUG_RWSEMS_WARN_ON(sem->magic != sem, sem);
>>
>> - /*
>> - * Optimize for the case when the rwsem is not locked at all.
>> - */
>> - tmp = RWSEM_UNLOCKED_VALUE;
>> - do {
>> + tmp = atomic_long_read(&sem->count);
>> + while (!(tmp & RWSEM_READ_FAILED_MASK)) {
>> if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp,
>> - tmp + RWSEM_READER_BIAS)) {
>> + tmp + RWSEM_READER_BIAS)) {
>> rwsem_set_reader_owned(sem);
>> return 1;
>> }
>> - } while (!(tmp & RWSEM_READ_FAILED_MASK));
>> + }
>> return 0;
>> }
> This is weird... so the only difference is that leading load, but given
> contention you'd expect that load to stall, also, given it's a
> non-exclusive load, to get stolen by a competing CPU. Whereas the old
> code would start with a cmpxchg, which obviously will also stall, but
> does an exclusive load.
>
> And the thinking is that the exclusive load and the presence of the
> cmpxchg loop would keep the line on that CPU for a little while and
> progress is made.
>
> Clearly this isn't working as expected. Also I suppose it would need
> wider testing...
For a contended case, doing a shared read first doing an exclusive
cmpxchg can certainly help to reduce cacheline trashing. I have no
objection to making this change.
I believe most of the other trylock functions do a read first before
doing an atomic operation. In essence, we assume the use of trylock
means the callers are expecting an contended lock whereas callers of
regular *lock() function are expecting an uncontended lock.
Acked-by: Waiman Long <longman@...hat.com>
-Longman
Powered by blists - more mailing lists