linux-kernel - Re: [RFC PATCH] [X86/mem] Handle unaligned case by avoiding store crossing cache line

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 12 Oct 2010 14:58:26 +0200
From:	Denys Vlasenko <vda.linux@...glemail.com>
To:	ling.ma@...el.com
Cc:	mingo@...e.hu, hpa@...or.com, tglx@...utronix.de,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] [X86/mem] Handle unaligned case by avoiding store
 crossing cache line

On Tue, Oct 12, 2010 at 10:48 PM,  <ling.ma@...el.com> wrote:
> From: Ma Ling <ling.ma@...el.com>
>
> In this patch we mannage to reduce penalty from crossing cache line
> on some CPU archs. There are two crossing-cache-line cases:
> read and write, but write is more expensive because of
> no cache-way predication and read-for-ownership operations
> on some archs, here we avoid sotre unaligned cases,
> another reason is shift register will cause more penalty
> on decode stages, so tolerate read.
...
> Signed-off-by: Ma Ling <ling.ma@...el.com>
> ---
>  arch/x86/lib/memcpy_64.S |   59 ++++++++++++++++++++++++++++++++++++++++-----
>  1 files changed, 52 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/lib/memcpy_64.S b/arch/x86/lib/memcpy_64.S
> index 75ef61e..7545b08 100644
> --- a/arch/x86/lib/memcpy_64.S
> +++ b/arch/x86/lib/memcpy_64.S
> @@ -45,7 +45,7 @@ ENTRY(memcpy)
>        /*
>         * Use 32bit CMP here to avoid long NOP padding.
>         */
> -       cmp  $0x20, %edx
> +       cmp  $0x28, %rdx

Well, look above your change. The comment says "Use 32bit CMP".
If you really want to go to 64-bit one, then change comment too.

> +       /*
> +        * We append data to avoid store crossing cache.
> +        */
> +       movq (%rsi), %rcx
> +       movq %rdi, %r8
> +       addq $8, %rdi
> +       andq $-8, %rdi
> +       movq %rcx, (%r8)
> +       subq %rdi, %r8
> +       addq %r8, %rdx
> +       subq %r8, %rsi

The comment doesn't really help to understand what you are doing here.
Maybe "Align store location to 32 bytes to avoid crossing cachelines"?

>        /*
> -        * At most 3 ALU operations in one cycle,
> -        * so append NOPS in the same 16bytes trunk.
> +        * We append data to avoid store crossing cache.
>         */

Same here.

-- 
vda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/