lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 16 Sep 2010 17:29:32 +0800
From:	Miao Xie <miaox@...fujitsu.com>
To:	Andi Kleen <andi@...stfloor.org>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...e.hu>, "Theodore Ts'o" <tytso@....edu>,
	Chris Mason <chris.mason@...cle.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Linux Btrfs <linux-btrfs@...r.kernel.org>,
	Linux Ext4 <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH] x86_64/lib: improve the performance of memmove

On Thu, 16 Sep 2010 10:40:08 +0200, Andi Kleen wrote:
> On Thu, 16 Sep 2010 15:16:31 +0800
> Miao Xie<miaox@...fujitsu.com>  wrote:
>
>> On Thu, 16 Sep 2010 08:48:25 +0200 (cest), Andi Kleen wrote:
>>>> When the dest and the src do overlap and the memory area is large,
>>>> memmove of
>>>> x86_64 is very inefficient, and it led to bad performance, such as
>>>> btrfs's file
>>>> deletion performance. This patch improved the performance of
>>>> memmove on x86_64
>>>> by using __memcpy_bwd() instead of byte copy when doing large
>>>> memory area copy
>>>> (len>   64).
>>>
>>>
>>> I still don't understand why you don't simply use a backwards
>>> string copy (with std) ? That should be much simpler and
>>> hopefully be as optimized for kernel copies on recent CPUs.
>>
>> But according to the comment of memcpy, some CPUs don't support "REP"
>> instruction,
>
> I think you misread the comment. REP prefixes are in all x86 CPUs.
> On some very old systems it wasn't optimized very well,
> but it probably doesn't make too much sense to optimize for those.
> On newer CPUs in fact REP should be usually faster than
> an unrolled loop.
>
> I'm not sure how optimized the backwards copy is, but most likely
> it is optimized too.
>
> Here's an untested patch that implements backwards copy with string
> instructions. Could you run it through your test harness?

Ok, I'll do it.

> +
> +/*
> + * Copy memory backwards (for memmove)
> + * rdi target
> + * rsi source
> + * rdx count
> + */
> +
> +ENTRY(memcpy_backwards):

s/://

> +	CFI_STARTPROC
> +	std
> +	movq %rdi, %rax
> +	movl %edx, %ecx
> +	add  %rdx, %rdi
> +	add  %rdx, %rsi

-	add  %rdx, %rdi
-	add  %rdx, %rsi
+	addq  %rdx, %rdi
+	addq  %rdx, %rsi

Besides that, the address that %rdi/%rsi pointed to is over the end of
mempry area that going to be copied, so we must tune the address to be
correct.
+	leaq -8(%rdi), %rdi
+	leaq -8(%rsi), %rsi

> +	shrl $3, %ecx
> +	andl $7, %edx
> +	rep movsq

The same as above.
+	leaq 8(%rdi), %rdi
+	leaq 8(%rsi), %rsi
+	decq %rsi
+	decq %rdi

> +	movl %edx, %ecx
> +	rep movsb
> +	cld
> +	ret
> +	CFI_ENDPROC
> +ENDPROC(memcpy_backwards)
> +	
> diff --git a/arch/x86/lib/memmove_64.c b/arch/x86/lib/memmove_64.c
> index 0a33909..6c00304 100644
> --- a/arch/x86/lib/memmove_64.c
> +++ b/arch/x86/lib/memmove_64.c
> @@ -5,16 +5,16 @@
>   #include<linux/string.h>
>   #include<linux/module.h>
>
> +extern void asmlinkage memcpy_backwards(void *dst, const void *src,
> +				        size_t count);

The type of the return value must be "void *".

Thanks
Miao

> +
>   #undef memmove
>   void *memmove(void *dest, const void *src, size_t count)
>   {
>   	if (dest<  src) {
>   		return memcpy(dest, src, count);
>   	} else {
> -		char *p = dest + count;
> -		const char *s = src + count;
> -		while (count--)
> -			*--p = *--s;
> +		return memcpy_backwards(dest, src, count);
>   	}
>   	return dest;
>   }
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists