[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f3bfbb3d269f4dc29260bc8a6ae6ef22@AcuMS.aculab.com>
Date: Tue, 20 Mar 2018 13:30:59 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Ingo Molnar' <mingo@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>
CC: 'Rahul Lakkireddy' <rahul.lakkireddy@...lsio.com>,
"x86@...nel.org" <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"mingo@...hat.com" <mingo@...hat.com>,
"hpa@...or.com" <hpa@...or.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
"ganeshgr@...lsio.com" <ganeshgr@...lsio.com>,
"nirranjan@...lsio.com" <nirranjan@...lsio.com>,
"indranil@...lsio.com" <indranil@...lsio.com>,
"Andy Lutomirski" <luto@...nel.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Fenghua Yu <fenghua.yu@...el.com>,
Eric Biggers <ebiggers3@...il.com>
Subject: RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access
From: Ingo Molnar
> Sent: 20 March 2018 10:54
...
> Note that a generic version might still be worth trying out, if and only if it's
> safe to access those vector registers directly: modern x86 CPUs will do their
> non-constant memcpy()s via the common memcpy_erms() function - which could in
> theory be an easy common point to be (cpufeatures-) patched to an AVX2 variant, if
> size (and alignment, perhaps) is a multiple of 32 bytes or so.
>
> Assuming it's correct with arbitrary user-space FPU state and if it results in any
> measurable speedups, which might not be the case: ERMS is supposed to be very
> fast.
>
> So even if it's possible (which it might not be), it could end up being slower
> than the ERMS version.
Last I checked memcpy() was implemented as 'rep movsb' on the latest Intel cpus.
Since memcpy_to/fromio() get aliased to memcpy() this generates byte copies.
The previous 'fastest' version of memcpy() was ok for uncached locations.
For PCIe I suspect that the actual instructions don't make a massive difference.
I'm not even sure interleaving two transfers makes any difference.
What makes a huge difference for memcpy_fromio() is the size of the register.
The time taken for a read will be largely independent of the width of the
register used.
David
Powered by blists - more mailing lists