lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 16 Sep 2010 17:29:32 +0800 From: Miao Xie <miaox@...fujitsu.com> To: Andi Kleen <andi@...stfloor.org> CC: Andrew Morton <akpm@...ux-foundation.org>, Ingo Molnar <mingo@...e.hu>, "Theodore Ts'o" <tytso@....edu>, Chris Mason <chris.mason@...cle.com>, Linux Kernel <linux-kernel@...r.kernel.org>, Linux Btrfs <linux-btrfs@...r.kernel.org>, Linux Ext4 <linux-ext4@...r.kernel.org> Subject: Re: [PATCH] x86_64/lib: improve the performance of memmove On Thu, 16 Sep 2010 10:40:08 +0200, Andi Kleen wrote: > On Thu, 16 Sep 2010 15:16:31 +0800 > Miao Xie<miaox@...fujitsu.com> wrote: > >> On Thu, 16 Sep 2010 08:48:25 +0200 (cest), Andi Kleen wrote: >>>> When the dest and the src do overlap and the memory area is large, >>>> memmove of >>>> x86_64 is very inefficient, and it led to bad performance, such as >>>> btrfs's file >>>> deletion performance. This patch improved the performance of >>>> memmove on x86_64 >>>> by using __memcpy_bwd() instead of byte copy when doing large >>>> memory area copy >>>> (len> 64). >>> >>> >>> I still don't understand why you don't simply use a backwards >>> string copy (with std) ? That should be much simpler and >>> hopefully be as optimized for kernel copies on recent CPUs. >> >> But according to the comment of memcpy, some CPUs don't support "REP" >> instruction, > > I think you misread the comment. REP prefixes are in all x86 CPUs. > On some very old systems it wasn't optimized very well, > but it probably doesn't make too much sense to optimize for those. > On newer CPUs in fact REP should be usually faster than > an unrolled loop. > > I'm not sure how optimized the backwards copy is, but most likely > it is optimized too. > > Here's an untested patch that implements backwards copy with string > instructions. Could you run it through your test harness? Ok, I'll do it. > + > +/* > + * Copy memory backwards (for memmove) > + * rdi target > + * rsi source > + * rdx count > + */ > + > +ENTRY(memcpy_backwards): s/:// > + CFI_STARTPROC > + std > + movq %rdi, %rax > + movl %edx, %ecx > + add %rdx, %rdi > + add %rdx, %rsi - add %rdx, %rdi - add %rdx, %rsi + addq %rdx, %rdi + addq %rdx, %rsi Besides that, the address that %rdi/%rsi pointed to is over the end of mempry area that going to be copied, so we must tune the address to be correct. + leaq -8(%rdi), %rdi + leaq -8(%rsi), %rsi > + shrl $3, %ecx > + andl $7, %edx > + rep movsq The same as above. + leaq 8(%rdi), %rdi + leaq 8(%rsi), %rsi + decq %rsi + decq %rdi > + movl %edx, %ecx > + rep movsb > + cld > + ret > + CFI_ENDPROC > +ENDPROC(memcpy_backwards) > + > diff --git a/arch/x86/lib/memmove_64.c b/arch/x86/lib/memmove_64.c > index 0a33909..6c00304 100644 > --- a/arch/x86/lib/memmove_64.c > +++ b/arch/x86/lib/memmove_64.c > @@ -5,16 +5,16 @@ > #include<linux/string.h> > #include<linux/module.h> > > +extern void asmlinkage memcpy_backwards(void *dst, const void *src, > + size_t count); The type of the return value must be "void *". Thanks Miao > + > #undef memmove > void *memmove(void *dest, const void *src, size_t count) > { > if (dest< src) { > return memcpy(dest, src, count); > } else { > - char *p = dest + count; > - const char *s = src + count; > - while (count--) > - *--p = *--s; > + return memcpy_backwards(dest, src, count); > } > return dest; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists