lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f34ddf7f-3ed9-6118-8106-eb9df110c44c@redhat.com>
Date:   Mon, 31 Jul 2023 21:59:33 -0400
From:   Waiman Long <longman@...hat.com>
To:     bibo mao <maobibo@...ngson.cn>, guoren@...nel.org,
        David.Laight@...LAB.COM, will@...nel.org, peterz@...radead.org,
        mingo@...hat.com
Cc:     linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-riscv@...ts.infradead.org, Guo Ren <guoren@...ux.alibaba.com>
Subject: Re: [PATCH V2] asm-generic: ticket-lock: Optimize
 arch_spin_value_unlocked

On 7/31/23 21:37, bibo mao wrote:
>
> 在 2023/7/31 23:16, Waiman Long 写道:
>> On 7/30/23 22:33, guoren@...nel.org wrote:
>>> From: Guo Ren <guoren@...ux.alibaba.com>
>>>
>>> The arch_spin_value_unlocked would cause an unnecessary memory
>>> access to the contended value. Although it won't cause a significant
>>> performance gap in most architectures, the arch_spin_value_unlocked
>>> argument contains enough information. Thus, remove unnecessary
>>> atomic_read in arch_spin_value_unlocked().
>>>
>>> The caller of arch_spin_value_unlocked() could benefit from this
>>> change. Currently, the only caller is lockref.
>>>
>>> Signed-off-by: Guo Ren <guoren@...nel.org>
>>> Cc: Waiman Long <longman@...hat.com>
>>> Cc: David Laight <David.Laight@...LAB.COM>
>>> Cc: Peter Zijlstra <peterz@...radead.org>
>>> Signed-off-by: Guo Ren <guoren@...ux.alibaba.com>
>>> ---
>>> Changelog
>>> V2:
>>>    - Fixup commit log with Waiman advice.
>>>    - Add Waiman comment in the commit msg.
>>> ---
>>>    include/asm-generic/spinlock.h | 16 +++++++++-------
>>>    1 file changed, 9 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
>>> index fdfebcb050f4..90803a826ba0 100644
>>> --- a/include/asm-generic/spinlock.h
>>> +++ b/include/asm-generic/spinlock.h
>>> @@ -68,11 +68,18 @@ static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
>>>        smp_store_release(ptr, (u16)val + 1);
>>>    }
>>>    +static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
>>> +{
>>> +    u32 val = lock.counter;
>>> +
>>> +    return ((val >> 16) == (val & 0xffff));
>>> +}
>>> +
>>>    static __always_inline int arch_spin_is_locked(arch_spinlock_t *lock)
>>>    {
>>> -    u32 val = atomic_read(lock);
>>> +    arch_spinlock_t val = READ_ONCE(*lock);
>>>    -    return ((val >> 16) != (val & 0xffff));
>>> +    return !arch_spin_value_unlocked(val);
>>>    }
>>>      static __always_inline int arch_spin_is_contended(arch_spinlock_t *lock)
>>> @@ -82,11 +89,6 @@ static __always_inline int arch_spin_is_contended(arch_spinlock_t *lock)
>>>        return (s16)((val >> 16) - (val & 0xffff)) > 1;
>>>    }
>>>    -static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
>>> -{
>>> -    return !arch_spin_is_locked(&lock);
>>> -}
>>> -
>>>    #include <asm/qrwlock.h>
>>>      #endif /* __ASM_GENERIC_SPINLOCK_H */
>> I am fine with the current change. However, modern optimizing compiler should be able to avoid the redundant memory read anyway. So this patch may not have an impact from the performance point of view.
> arch_spin_value_unlocked is called with lockref like this:
>
> #define CMPXCHG_LOOP(CODE, SUCCESS) do {                                        \
>          int retry = 100;                                                        \
>          struct lockref old;                                                     \
>          BUILD_BUG_ON(sizeof(old) != 8);                                         \
>          old.lock_count = READ_ONCE(lockref->lock_count);                        \
>          while (likely(arch_spin_value_unlocked(old.lock.rlock.raw_lock))) {     \
>
> With modern optimizing compiler, Is it possible that old value of
> old.lock.rlock.raw_lock is cached in register, despite that try_cmpxchg64_relaxed
> modifies the memory of old.lock_count with new value?

What I meant is that the call to arch_spin_value_unlocked() as it is 
today will not generate 2 memory reads of the same location with or 
without the patch. Of course, a new memory read will be needed after a 
failed cmpxchg().

Cheers,
Longman

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ