linux-kernel - Re: [RFC 10/12] x86, rwsem: simplify __down

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160203081016.GD32652@gmail.com>
Date:	Wed, 3 Feb 2016 09:10:16 +0100
From:	Ingo Molnar <mingo@...nel.org>
To:	Michal Hocko <mhocko@...nel.org>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	"David S. Miller" <davem@...emloft.net>,
	Tony Luck <tony.luck@...el.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Chris Zankel <chris@...kel.net>,
	Max Filippov <jcmvbkbc@...il.com>, x86@...nel.org,
	linux-alpha@...r.kernel.org, linux-ia64@...r.kernel.org,
	linux-s390@...r.kernel.org, linux-sh@...r.kernel.org,
	sparclinux@...r.kernel.org, linux-xtensa@...ux-xtensa.org,
	linux-arch@...r.kernel.org, Michal Hocko <mhocko@...e.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"Paul E. McKenney" <paulmck@...ibm.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [RFC 10/12] x86, rwsem: simplify __down_write


* Michal Hocko <mhocko@...nel.org> wrote:

> From: Michal Hocko <mhocko@...e.com>
> 
> x86 implementation of __down_write is using inline asm to optimize the
> code flow. This however requires that it has go over an additional hop
> for the slow path call_rwsem_down_write_failed which has to
> save_common_regs/restore_common_regs to preserve the calling convention.
> This, however doesn't add much because the fast path only saves one
> register push/pop (rdx) when compared to the generic implementation:
> 
> Before:
> 0000000000000019 <down_write>:
>   19:   e8 00 00 00 00          callq  1e <down_write+0x5>
>   1e:   55                      push   %rbp
>   1f:   48 ba 01 00 00 00 ff    movabs $0xffffffff00000001,%rdx
>   26:   ff ff ff
>   29:   48 89 f8                mov    %rdi,%rax
>   2c:   48 89 e5                mov    %rsp,%rbp
>   2f:   f0 48 0f c1 10          lock xadd %rdx,(%rax)
>   34:   85 d2                   test   %edx,%edx
>   36:   74 05                   je     3d <down_write+0x24>
>   38:   e8 00 00 00 00          callq  3d <down_write+0x24>
>   3d:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
>   44:   00 00
>   46:   5d                      pop    %rbp
>   47:   48 89 47 38             mov    %rax,0x38(%rdi)
>   4b:   c3                      retq
> 
> After:
> 0000000000000019 <down_write>:
>   19:   e8 00 00 00 00          callq  1e <down_write+0x5>
>   1e:   55                      push   %rbp
>   1f:   48 b8 01 00 00 00 ff    movabs $0xffffffff00000001,%rax
>   26:   ff ff ff
>   29:   48 89 e5                mov    %rsp,%rbp
>   2c:   53                      push   %rbx
>   2d:   48 89 fb                mov    %rdi,%rbx
>   30:   f0 48 0f c1 07          lock xadd %rax,(%rdi)
>   35:   48 85 c0                test   %rax,%rax
>   38:   74 05                   je     3f <down_write+0x26>
>   3a:   e8 00 00 00 00          callq  3f <down_write+0x26>
>   3f:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
>   46:   00 00
>   48:   48 89 43 38             mov    %rax,0x38(%rbx)
>   4c:   5b                      pop    %rbx
>   4d:   5d                      pop    %rbp
>   4e:   c3                      retq

I'm not convinced about the removal of this optimization at all.

> This doesn't seem to justify the code obfuscation and complexity. Use
> the generic implementation instead.
> 
> Signed-off-by: Michal Hocko <mhocko@...e.com>
> ---
>  arch/x86/include/asm/rwsem.h | 17 +++++------------
>  arch/x86/lib/rwsem.S         |  9 ---------
>  2 files changed, 5 insertions(+), 21 deletions(-)

Turn the argument around, would we be willing to save two instructions off the 
fast path of a commonly used locking construct, with such a simple optimization:

>  arch/x86/include/asm/rwsem.h | 17 ++++++++++++-----
>  arch/x86/lib/rwsem.S         |  9 +++++++++
>  2 files changed, 21 insertions(+), 5 deletions(-)

?

Yes!

So, if you want to remove the assembly code - can we achieve that without hurting 
the generated fast path, using the compiler?

Thanks,

	Ingo