lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 9 Feb 2016 15:15:57 -0800
From:	"Luck, Tony" <tony.luck@...el.com>
To:	Borislav Petkov <bp@...en8.de>
Cc:	Ingo Molnar <mingo@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Andy Lutomirski <luto@...nel.org>,
	Dan Williams <dan.j.williams@...el.com>, elliott@....com,
	Brian Gerst <brgerst@...il.com>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, linux-nvdimm@...1.01.org, x86@...nel.org
Subject: Re: [PATCH v10 3/4] x86, mce: Add __mcsafe_copy()

> You can save yourself this MOV here in what is, I'm assuming, the
> general likely case where @src is aligned and do:
> 
>         /* check for bad alignment of source */
>         testl $7, %esi
>         /* already aligned? */
>         jz 102f
> 
>         movl %esi,%ecx
>         subl $8,%ecx
>         negl %ecx
>         subl %ecx,%edx
> 0:      movb (%rsi),%al
>         movb %al,(%rdi)
>         incq %rsi
>         incq %rdi
>         decl %ecx
>         jnz 0b

The "testl $7, %esi" just checks the low three bits ... it doesn't
change %esi.  But the code from the "subl $8" on down assumes that
%ecx is a number in [1..7] as the count of bytes to copy until we
achieve alignment.

So your "movl %esi,%ecx" needs to be somthing that just copies the
low three bits and zeroes the high part of %ecx.  Is there a cute
way to do that in x86 assembler?

> Why aren't we pushing %r12-%r15 on the stack after the "jz 17f" above
> and using them too and thus copying a whole cacheline in one go?
> 
> We would need to restore them when we're done with the cacheline-wise
> shuffle, of course.

I copied that loop from arch/x86/lib/copy_user_64.S:__copy_user_nocache()
I guess the answer depends on whether you generally copy enough
cache lines to save enough time to cover the cost of saving and
restoring those registers.

-Tony

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ