lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZR2VitjPb6Miksim@gmail.com>
Date:   Wed, 4 Oct 2023 18:40:42 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     Uros Bizjak <ubizjak@...il.com>
Cc:     x86@...nel.org, linux-kernel@...r.kernel.org,
        Andy Lutomirski <luto@...nel.org>,
        Nadav Amit <namit@...are.com>, Brian Gerst <brgerst@...il.com>,
        Denys Vlasenko <dvlasenk@...hat.com>,
        "H . Peter Anvin" <hpa@...or.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Borislav Petkov <bp@...en8.de>,
        Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH 4/4] x86/percpu: Use C for percpu read/write accessors


* Ingo Molnar <mingo@...nel.org> wrote:

> 
> * Uros Bizjak <ubizjak@...il.com> wrote:
> 
> > The percpu code mostly uses inline assembly. Using segment qualifiers
> > allows to use C code instead, which enables the compiler to perform
> > various optimizations (e.g. propagation of memory arguments). Convert
> > percpu read and write accessors to C code, so the memory argument can
> > be propagated to the instruction that uses this argument.
> > 
> > Some examples of propagations:
> > 
> > a) into sign/zero extensions:
> > 
> >  110b54:       65 0f b6 05 00 00 00    movzbl %gs:0x0(%rip),%eax
> >  11ab90:       65 0f b6 15 00 00 00    movzbl %gs:0x0(%rip),%edx
> >  14484a:       65 0f b7 35 00 00 00    movzwl %gs:0x0(%rip),%esi
> >  1a08a9:       65 0f b6 43 78          movzbl %gs:0x78(%rbx),%eax
> >  1a08f9:       65 0f b6 43 78          movzbl %gs:0x78(%rbx),%eax
> > 
> >  4ab29a:       65 48 63 15 00 00 00    movslq %gs:0x0(%rip),%rdx
> >  4be128:       65 4c 63 25 00 00 00    movslq %gs:0x0(%rip),%r12
> >  547468:       65 48 63 1f             movslq %gs:(%rdi),%rbx
> >  5474e7:       65 48 63 0a             movslq %gs:(%rdx),%rcx
> >  54d05d:       65 48 63 0d 00 00 00    movslq %gs:0x0(%rip),%rcx
> 
> Could you please also quote a 'before' assembly sequence, at least once
> per group of propagations?

Ie. for any changes to x86 code generation, please follow the changelog 
format of:

   7c097ca50d2b ("x86/percpu: Do not clobber %rsi in percpu_{try_,}cmpxchg{64,128}_op")

   ...
	Move the load of %rsi outside inline asm, so the compiler can
	reuse the value. The code in slub.o improves from:

	    55ac:	49 8b 3c 24          	mov    (%r12),%rdi
	    55b0:	48 8d 4a 40          	lea    0x40(%rdx),%rcx
	    55b4:	49 8b 1c 07          	mov    (%r15,%rax,1),%rbx
	    55b8:	4c 89 f8             	mov    %r15,%rax
	    55bb:	48 8d 37             	lea    (%rdi),%rsi
	    55be:	e8 00 00 00 00       	callq  55c3 <...>
				55bf: R_X86_64_PLT32	this_cpu_cmpxchg16b_emu-0x4
	    55c3:	75 a3                	jne    5568 <...>
	    55c5:	...

	 0000000000000000 <.altinstr_replacement>:
	   5:	65 48 0f c7 0f       	cmpxchg16b %gs:(%rdi)

	to:

	    55ac:	49 8b 34 24          	mov    (%r12),%rsi
	    55b0:	48 8d 4a 40          	lea    0x40(%rdx),%rcx
	    55b4:	49 8b 1c 07          	mov    (%r15,%rax,1),%rbx
	    55b8:	4c 89 f8             	mov    %r15,%rax
	    55bb:	e8 00 00 00 00       	callq  55c0 <...>
				55bc: R_X86_64_PLT32	this_cpu_cmpxchg16b_emu-0x4
	    55c0:	75 a6                	jne    5568 <...>
	    55c2:	...

	Where the alternative replacement instruction now uses %rsi:

	 0000000000000000 <.altinstr_replacement>:
	   5:	65 48 0f c7 0e       	cmpxchg16b %gs:(%rsi)

	The instruction (effectively a reg-reg move) at 55bb: in the original
	assembly is removed. Also, both the CALL and replacement CMPXCHG16B
	are 5 bytes long, removing the need for NOPs in the asm code.
   ...

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ