lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 20 Mar 2018 13:30:59 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Ingo Molnar' <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>
CC:     'Rahul Lakkireddy' <rahul.lakkireddy@...lsio.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "hpa@...or.com" <hpa@...or.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
        "ganeshgr@...lsio.com" <ganeshgr@...lsio.com>,
        "nirranjan@...lsio.com" <nirranjan@...lsio.com>,
        "indranil@...lsio.com" <indranil@...lsio.com>,
        "Andy Lutomirski" <luto@...nel.org>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Fenghua Yu <fenghua.yu@...el.com>,
        Eric Biggers <ebiggers3@...il.com>
Subject: RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access

From: Ingo Molnar
> Sent: 20 March 2018 10:54
...
> Note that a generic version might still be worth trying out, if and only if it's
> safe to access those vector registers directly: modern x86 CPUs will do their
> non-constant memcpy()s via the common memcpy_erms() function - which could in
> theory be an easy common point to be (cpufeatures-) patched to an AVX2 variant, if
> size (and alignment, perhaps) is a multiple of 32 bytes or so.
> 
> Assuming it's correct with arbitrary user-space FPU state and if it results in any
> measurable speedups, which might not be the case: ERMS is supposed to be very
> fast.
> 
> So even if it's possible (which it might not be), it could end up being slower
> than the ERMS version.

Last I checked memcpy() was implemented as 'rep movsb' on the latest Intel cpus.
Since memcpy_to/fromio() get aliased to memcpy() this generates byte copies.
The previous 'fastest' version of memcpy() was ok for uncached locations.

For PCIe I suspect that the actual instructions don't make a massive difference.
I'm not even sure interleaving two transfers makes any difference.
What makes a huge difference for memcpy_fromio() is the size of the register.
The time taken for a read will be largely independent of the width of the
register used.

	David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ