lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 28 Sep 2022 09:24:04 +0200 From: Rasmus Villemoes <linux@...musvillemoes.dk> To: Nick Desaulniers <ndesaulniers@...gle.com>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com> Cc: x86@...nel.org, "H . Peter Anvin" <hpa@...or.com>, Peter Zijlstra <peterz@...radead.org>, Kees Cook <keescook@...omium.org>, linux-kernel@...r.kernel.org, Linus Torvalds <torvalds@...ux-foundation.org>, llvm@...ts.linux.dev, Andy Lutomirski <luto@...nel.org> Subject: Re: [PATCH v3] x86, mem: move memmove to out of line assembler On 27/09/2022 23.02, Nick Desaulniers wrote: > + /* Decide forward/backward copy mode */ > + cmpl dest, src > + jb .Lbackwards_header I know you're mostly just moving existing code, but for my own education I'd like to understand this. > + /* > + * movs instruction have many startup latency > + * so we handle small size by general register. > + */ > + cmpl $680, n > + jb .Ltoo_small_forwards OK, this I get, there's some overhead, and hence we need _some_ cutoff value; 680 is probably chosen by some trial-and-error, but the exact value likely doesn't matter too much. > + /* > + * movs instruction is only good for aligned case. > + */ > + movl src, tmp0 > + xorl dest, tmp0 > + andl $0xff, tmp0 > + jz .Lforward_movs But this part I don't understand at all. This checks that the src and dest have the same %256 value, which is a rather odd thing, and very unlikely to ever be hit in practice. I could understand if it checked that they were both 4 or 8 or 16-byte aligned (i.e., (src|dest)&FOO)), or if it checked that they had the same offset within a cacheline [say (src^dest)&0x3f]. Any idea where that comes from? Or am I just incapable of reading x86 asm? > +.Ltoo_small_forwards: > + subl $0x10, n > + > + /* > + * We gobble 16 bytes forward in each loop. > + */ > +.L16_byteswap_forwards_loop: > + subl $0x10, n > + movl 0*4(src), tmp0 > + movl 1*4(src), tmp1 > + movl tmp0, 0*4(dest) > + movl tmp1, 1*4(dest) > + movl 2*4(src), tmp0 > + movl 3*4(src), tmp1 > + movl tmp0, 2*4(dest) > + movl tmp1, 3*4(dest) > + leal 0x10(src), src > + leal 0x10(dest), dest > + jae .L16_byteswap_forwards_loop > + addl $0x10, n > + jmp .L16_byteswap > + > + /* > + * Handle data forward by movs. > + */ > +.p2align 4 > +.Lforward_movs: > + movl -4(src, n), tmp0 > + leal -4(dest, n), tmp1 > + shrl $2, n > + rep movsl > + movl tmp0, (tmp1) > + jmp .Ldone So in the original code, %1 was forced to be %esi and %2 was forced to be %edi and they were initialized by src and dest. But here I fail to see how those registers have been properly set up before the rep movs; your names for those are tmp0 and tmp2. You have just loaded the last word of the source to %edi, and AFAICT %esi aka tmp2 is entirely uninitialized at this point (the only use is in L16_byteswap). I must be missing something. Please enlighten me. Rasmus
Powered by blists - more mailing lists