[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <035ac754e54a4b14844f3300ae1432a9@AcuMS.aculab.com>
Date: Thu, 22 Mar 2018 10:48:19 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Linus Torvalds' <torvalds@...ux-foundation.org>,
Alexander Duyck <alexander.duyck@...il.com>
CC: Rahul Lakkireddy <rahul.lakkireddy@...lsio.com>,
Thomas Gleixner <tglx@...utronix.de>,
"x86@...nel.org" <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"mingo@...hat.com" <mingo@...hat.com>,
"hpa@...or.com" <hpa@...or.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
Ganesh GR <ganeshgr@...lsio.com>,
"Nirranjan Kirubaharan" <nirranjan@...lsio.com>,
Indranil Choudhury <indranil@...lsio.com>
Subject: RE: [RFC PATCH 2/3] x86/io: implement 256-bit IO read and write
From: Linus Torvalds
> Sent: 22 March 2018 01:27
> On Tue, Mar 20, 2018 at 7:42 AM, Alexander Duyck
> <alexander.duyck@...il.com> wrote:
> >
> > Instead of framing this as an enhanced version of the read/write ops
> > why not look at replacing or extending something like the
> > memcpy_fromio or memcpy_toio operations?
>
> Yes, doing something like "memcpy_fromio_avx()" is much more
> palatable, in that it works like the crypto functions do - if you do
> big chunks, the "kernel_fpu_begin/end()" isn't nearly the issue it can
> be otherwise.
>
> Note that we definitely have seen hardware that *depends* on the
> regular memcpy_fromio()" not doing big reads. I don't know how
> hardware people screw it up, but it's clearly possible.
I wonder if that hardware works with the current kernel on recent cpus?
I bet it doesn't like the byte accesses that get generated either.
> So it really needs to be an explicitly named function that basically a
> driver can use to say "my hardware really likes big aligned accesses"
> and explicitly ask for some AVX version if possible.
For x86 being able to request a copy done as 'rep movsx' (for each x)
would be useful.
For io copies the cost of the memory access is probably much smaller
that the io access, so really fancy copies are unlikely make much
difference unless the width of the io access changes.
David
Powered by blists - more mailing lists