linux-kernel - Re: [PATCH v3] x86, mem: move memmove to out of line assembler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a753c608-813a-f880-67ca-f16fe503b152@rasmusvillemoes.dk>
Date:   Wed, 28 Sep 2022 09:24:04 +0200
From:   Rasmus Villemoes <linux@...musvillemoes.dk>
To:     Nick Desaulniers <ndesaulniers@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>
Cc:     x86@...nel.org, "H . Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Kees Cook <keescook@...omium.org>,
        linux-kernel@...r.kernel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        llvm@...ts.linux.dev, Andy Lutomirski <luto@...nel.org>
Subject: Re: [PATCH v3] x86, mem: move memmove to out of line assembler

On 27/09/2022 23.02, Nick Desaulniers wrote:

> +	/* Decide forward/backward copy mode */
> +	cmpl	dest, src
> +	jb	.Lbackwards_header

I know you're mostly just moving existing code, but for my own education
I'd like to understand this.

> +	/*
> +	 * movs instruction have many startup latency
> +	 * so we handle small size by general register.
> +	 */
> +	cmpl	$680, n
> +	jb	.Ltoo_small_forwards

OK, this I get, there's some overhead, and hence we need _some_ cutoff
value; 680 is probably chosen by some trial-and-error, but the exact
value likely doesn't matter too much.

> +	/*
> +	 * movs instruction is only good for aligned case.
> +	 */
> +	movl	src, tmp0
> +	xorl	dest, tmp0
> +	andl	$0xff, tmp0
> +	jz	.Lforward_movs

But this part I don't understand at all. This checks that the src and
dest have the same %256 value, which is a rather odd thing, and very
unlikely to ever be hit in practice. I could understand if it checked
that they were both 4 or 8 or 16-byte aligned (i.e., (src|dest)&FOO)),
or if it checked that they had the same offset within a cacheline [say
(src^dest)&0x3f].

Any idea where that comes from? Or am I just incapable of reading x86 asm?

> +.Ltoo_small_forwards:
> +	subl	$0x10, n
> +
> +	/*
> +	 * We gobble 16 bytes forward in each loop.
> +	 */
> +.L16_byteswap_forwards_loop:
> +	subl	$0x10, n
> +	movl	0*4(src), tmp0
> +	movl	1*4(src), tmp1
> +	movl	tmp0, 0*4(dest)
> +	movl	tmp1, 1*4(dest)
> +	movl	2*4(src), tmp0
> +	movl	3*4(src), tmp1
> +	movl	tmp0, 2*4(dest)
> +	movl	tmp1, 3*4(dest)
> +	leal	0x10(src), src
> +	leal	0x10(dest), dest
> +	jae	.L16_byteswap_forwards_loop
> +	addl	$0x10, n
> +	jmp	.L16_byteswap
> +
> +	/*
> +	 * Handle data forward by movs.
> +	 */
> +.p2align 4
> +.Lforward_movs:
> +	movl	-4(src, n), tmp0
> +	leal	-4(dest, n), tmp1
> +	shrl	$2, n
> +	rep	movsl
> +	movl	tmp0, (tmp1)
> +	jmp	.Ldone

So in the original code, %1 was forced to be %esi and %2 was forced to
be %edi and they were initialized by src and dest. But here I fail to
see how those registers have been properly set up before the rep movs;
your names for those are tmp0 and tmp2. You have just loaded the last
word of the source to %edi, and AFAICT %esi aka tmp2 is entirely
uninitialized at this point (the only use is in L16_byteswap).

I must be missing something. Please enlighten me.

Rasmus