[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4C91E37C.2060309@cn.fujitsu.com>
Date: Thu, 16 Sep 2010 17:29:32 +0800
From: Miao Xie <miaox@...fujitsu.com>
To: Andi Kleen <andi@...stfloor.org>
CC: Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...e.hu>, "Theodore Ts'o" <tytso@....edu>,
Chris Mason <chris.mason@...cle.com>,
Linux Kernel <linux-kernel@...r.kernel.org>,
Linux Btrfs <linux-btrfs@...r.kernel.org>,
Linux Ext4 <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH] x86_64/lib: improve the performance of memmove
On Thu, 16 Sep 2010 10:40:08 +0200, Andi Kleen wrote:
> On Thu, 16 Sep 2010 15:16:31 +0800
> Miao Xie<miaox@...fujitsu.com> wrote:
>
>> On Thu, 16 Sep 2010 08:48:25 +0200 (cest), Andi Kleen wrote:
>>>> When the dest and the src do overlap and the memory area is large,
>>>> memmove of
>>>> x86_64 is very inefficient, and it led to bad performance, such as
>>>> btrfs's file
>>>> deletion performance. This patch improved the performance of
>>>> memmove on x86_64
>>>> by using __memcpy_bwd() instead of byte copy when doing large
>>>> memory area copy
>>>> (len> 64).
>>>
>>>
>>> I still don't understand why you don't simply use a backwards
>>> string copy (with std) ? That should be much simpler and
>>> hopefully be as optimized for kernel copies on recent CPUs.
>>
>> But according to the comment of memcpy, some CPUs don't support "REP"
>> instruction,
>
> I think you misread the comment. REP prefixes are in all x86 CPUs.
> On some very old systems it wasn't optimized very well,
> but it probably doesn't make too much sense to optimize for those.
> On newer CPUs in fact REP should be usually faster than
> an unrolled loop.
>
> I'm not sure how optimized the backwards copy is, but most likely
> it is optimized too.
>
> Here's an untested patch that implements backwards copy with string
> instructions. Could you run it through your test harness?
Ok, I'll do it.
> +
> +/*
> + * Copy memory backwards (for memmove)
> + * rdi target
> + * rsi source
> + * rdx count
> + */
> +
> +ENTRY(memcpy_backwards):
s/://
> + CFI_STARTPROC
> + std
> + movq %rdi, %rax
> + movl %edx, %ecx
> + add %rdx, %rdi
> + add %rdx, %rsi
- add %rdx, %rdi
- add %rdx, %rsi
+ addq %rdx, %rdi
+ addq %rdx, %rsi
Besides that, the address that %rdi/%rsi pointed to is over the end of
mempry area that going to be copied, so we must tune the address to be
correct.
+ leaq -8(%rdi), %rdi
+ leaq -8(%rsi), %rsi
> + shrl $3, %ecx
> + andl $7, %edx
> + rep movsq
The same as above.
+ leaq 8(%rdi), %rdi
+ leaq 8(%rsi), %rsi
+ decq %rsi
+ decq %rdi
> + movl %edx, %ecx
> + rep movsb
> + cld
> + ret
> + CFI_ENDPROC
> +ENDPROC(memcpy_backwards)
> +
> diff --git a/arch/x86/lib/memmove_64.c b/arch/x86/lib/memmove_64.c
> index 0a33909..6c00304 100644
> --- a/arch/x86/lib/memmove_64.c
> +++ b/arch/x86/lib/memmove_64.c
> @@ -5,16 +5,16 @@
> #include<linux/string.h>
> #include<linux/module.h>
>
> +extern void asmlinkage memcpy_backwards(void *dst, const void *src,
> + size_t count);
The type of the return value must be "void *".
Thanks
Miao
> +
> #undef memmove
> void *memmove(void *dest, const void *src, size_t count)
> {
> if (dest< src) {
> return memcpy(dest, src, count);
> } else {
> - char *p = dest + count;
> - const char *s = src + count;
> - while (count--)
> - *--p = *--s;
> + return memcpy_backwards(dest, src, count);
> }
> return dest;
> }
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists