[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <35ab0ccca42b4b4695f6c99b6d741b8f@AcuMS.aculab.com>
Date: Fri, 30 Sep 2022 10:14:25 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Nick Desaulniers' <ndesaulniers@...gle.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>
CC: "x86@...nel.org" <x86@...nel.org>,
"H . Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <peterz@...radead.org>,
Kees Cook <keescook@...omium.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"llvm@...ts.linux.dev" <llvm@...ts.linux.dev>,
Andy Lutomirski <luto@...nel.org>,
Rasmus Villemoes <linux@...musvillemoes.dk>
Subject: RE: [PATCH v4] x86, mem: move memmove to out of line assembler
From: Nick Desaulniers
> Sent: 28 September 2022 22:05
...
Reading it again, what is this test supposed to achieve?
> + /*
> + * movs instruction is only good for aligned case.
> + */
> + movl src, tmp0
> + xorl dest, tmp0
> + andl $0xff, tmp0
> + jz .Lforward_movs
The 'aligned' test would be '(src | dest) & 3'.
(Or maybe '& 7' since some 32bit x86 cpu actally
do 8 byte aligned 'rep movsl' faster than 4 byte
aligned ones.
OTOH the code loop is likely to be slower still.
I've not tried measuring misaligned 'rep movsw' but
on some recent intel cpu normal misaligned reads cost
almost nothing - even when doing two reads/clock.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists