lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFz1RRY6KcqVVZ9tBH7PDXfBwkZ1AhJcSHPABLXMkNJCOA@mail.gmail.com>
Date:	Thu, 1 Sep 2011 09:18:32 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Maarten Lankhorst <m.b.lankhorst@...il.com>
Cc:	Borislav Petkov <bp@...64.org>,
	"Valdis.Kletnieks@...edu" <Valdis.Kletnieks@...edu>,
	Borislav Petkov <bp@...en8.de>, Ingo Molnar <mingo@...e.hu>,
	melwyn lobo <linux.melwyn@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"H. Peter Anvin" <hpa@...or.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: x86 memcpy performance

On Thu, Sep 1, 2011 at 8:15 AM, Maarten Lankhorst
<m.b.lankhorst@...il.com> wrote:
>
> This work intrigued me, in some cases kernel memcpy was a lot faster than sse memcpy,
> and I finally figured out why. I also extended the test to an optimized avx memcpy,
> but I think the kernel memcpy will always win in the aligned case.

"rep movs" is generally optimized in microcode on most modern Intel
CPU's for some easyish cases, and it will outperform just about
anything.

Atom is a notable exception, but if you expect performance on any
general loads from Atom, you need to get your head examined. Atom is a
disaster for anything but tuned loops.

The "easyish cases" depend on microarchitecture. They are improving,
so long-term "rep movs" is the best way regardless, but for most
current ones it's something like "source aligned to 8 bytes *and*
source and destination are equal "mod 64"".

And that's true in a lot of common situations. It's true for the page
copy, for example, and it's often true for big user "read()/write()"
calls (but "often" may not be "often enough" - high-performance
userland should strive to align read/write buffers to 64 bytes, for
example).

Many other cases of "memcpy()" are the fairly small, constant-sized
ones, where the optimal strategy tends to be "move words by hand".

                      Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ