[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7f0ddb3678814c7bab180714437795e0@AcuMS.aculab.com>
Date: Mon, 19 Mar 2018 14:53:17 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Rahul Lakkireddy' <rahul.lakkireddy@...lsio.com>,
"x86@...nel.org" <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC: "tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>,
"hpa@...or.com" <hpa@...or.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
"ganeshgr@...lsio.com" <ganeshgr@...lsio.com>,
"nirranjan@...lsio.com" <nirranjan@...lsio.com>,
"indranil@...lsio.com" <indranil@...lsio.com>
Subject: RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access
From: Rahul Lakkireddy
> Sent: 19 March 2018 14:21
>
> This series of patches add support for 256-bit IO read and write.
> The APIs are readqq and writeqq (quad quadword - 4 x 64), that read
> and write 256-bits at a time from IO, respectively.
Why not use the AVX2 registers to get 512bit accesses.
> Patch 1 adds u256 type and adds necessary non-atomic accessors. Also
> adds byteorder conversion APIs.
>
> Patch 2 adds 256-bit read and write to x86 via VMOVDQU AVX CPU
> instructions.
>
> Patch 3 updates cxgb4 driver to use the readqq API to speed up
> reading on-chip memory 256-bits at a time.
Calling kernel_fpu_begin() is likely to be slow.
I doubt you want to do it every time around a loop of accesses.
In principle it ought to be possible to get access to one or two
(eg) AVX registers by saving them to stack and telling the fpu
save code where you've put them.
Then the IPI fp save code could then copy the saved values over
the current values if asked to save the fp state for a process.
This should be reasonable cheap - especially if there isn't an
fp save IPI.
OTOH, for x86, if the code always runs in process context (eg from a
system call) then, since the ABI defines them all as caller-saved
the AVX(2) registers, it is only necessary to ensure that the current
FPU registers belong to the current process once.
The registers can be set to zero by an 'invalidate' instruction on
system call entry (hope this is done) and after use.
David
Powered by blists - more mailing lists