lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 7 Aug 2023 20:36:55 +0200
From:   Mateusz Guzik <mjguzik@...il.com>
To:     guoren@...nel.org
Cc:     David.Laight@...LAB.COM, will@...nel.org, peterz@...radead.org,
        mingo@...hat.com, longman@...hat.com, linux-arch@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-riscv@...ts.infradead.org,
        Guo Ren <guoren@...ux.alibaba.com>
Subject: Re: [PATCH V2] asm-generic: ticket-lock: Optimize
 arch_spin_value_unlocked

On Sun, Jul 30, 2023 at 10:33:08PM -0400, guoren@...nel.org wrote:
> From: Guo Ren <guoren@...ux.alibaba.com>
> 
> The arch_spin_value_unlocked would cause an unnecessary memory
> access to the contended value. Although it won't cause a significant
> performance gap in most architectures, the arch_spin_value_unlocked
> argument contains enough information. Thus, remove unnecessary
> atomic_read in arch_spin_value_unlocked().
> 
> The caller of arch_spin_value_unlocked() could benefit from this
> change. Currently, the only caller is lockref.
> 

Have you verified you are getting an extra memory access from this in
lockref? What architecture is it?

I have no opinion about the patch itself, I will note though that the
argument to the routine is *not* the actual memory-shared lockref,
instead it's something from the local copy obtained with READ_ONCE
from the real thing. So I would be surprised if the stock routine was
generating accesses to that sucker.

Nonetheless, if the patched routine adds nasty asm, that would be nice
to sort out.

FWIW on x86-64 qspinlock is used (i.e. not the stuff you are patching)
and I verified there are only 2 memory accesses -- the initial READ_ONCE
and later cmpxchg. I don't know which archs *don't* use qspinlock.

It also turns out generated asm is quite atrocious and cleaning it up
may yield a small win under more traffic. Maybe I'll see about it later
this week.

For example, disassembling lockref_put_return:
<+0>:     mov    (%rdi),%rax            <-- initial load, expected
<+3>:     mov    $0x64,%r8d
<+9>:     mov    %rax,%rdx
<+12>:    test   %eax,%eax              <-- retries loop back here
					<-- this is also the unlocked
					    check
<+14>:    jne    0xffffffff8157aba3 <lockref_put_return+67>
<+16>:    mov    %rdx,%rsi
<+19>:    mov    %edx,%edx
<+21>:    sar    $0x20,%rsi
<+25>:    lea    -0x1(%rsi),%ecx        <-- new.count--;
<+28>:    shl    $0x20,%rcx
<+32>:    or     %rcx,%rdx
<+35>:    test   %esi,%esi
<+37>:    jle    0xffffffff8157aba3 <lockref_put_return+67>
<+39>:    lock cmpxchg %rdx,(%rdi)      <-- the attempt to change
<+44>:    jne    0xffffffff8157ab9a <lockref_put_return+58>
<+46>:    shr    $0x20,%rdx
<+50>:    mov    %rdx,%rax
<+53>:    jmp    0xffffffff81af8540 <__x86_return_thunk>
<+58>:    mov    %rax,%rdx
<+61>:    sub    $0x1,%r8d              <-- retry count check
<+65>:    jne    0xffffffff8157ab6c <lockref_put_return+12> <-- go back
<+67>:    mov    $0xffffffff,%eax
<+72>:    jmp    0xffffffff81af8540 <__x86_return_thunk>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ