lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 20 Mar 2018 09:26:51 +0100
From:   Ingo Molnar <mingo@...nel.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     David Laight <David.Laight@...LAB.COM>,
        'Rahul Lakkireddy' <rahul.lakkireddy@...lsio.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "hpa@...or.com" <hpa@...or.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
        "ganeshgr@...lsio.com" <ganeshgr@...lsio.com>,
        "nirranjan@...lsio.com" <nirranjan@...lsio.com>,
        "indranil@...lsio.com" <indranil@...lsio.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Thomas Gleixner <tglx@...utronix.de>,
        Fenghua Yu <fenghua.yu@...el.com>,
        Eric Biggers <ebiggers3@...il.com>
Subject: Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access


* Thomas Gleixner <tglx@...utronix.de> wrote:

> > Useful also for code that needs AVX-like registers to do things like CRCs.
> 
> x86/crypto/ has a lot of AVX optimized code.

Yeah, that's true, but the crypto code is processing fundamentally bigger blocks 
of data, which amortizes the cost of using kernel_fpu_begin()/_end().

kernel_fpu_begin()/_end() is a pretty heavy operation because it does a full FPU 
save/restore via the XSAVE[S] and XRSTOR[S] instructions, which can easily copy a 
thousand bytes around! So kernel_fpu_begin()/_end() is probably a non-starter for 
something small, like a single 256-bit or 512-bit word access.

But there's actually a new thing in modern kernels: we got rid of (most of) lazy 
save/restore FPU code, our new x86 FPU model is very "direct" with no FPU faults 
taken normally.

So assuming the target driver will only load on modern FPUs I *think* it should 
actually be possible to do something like (pseudocode):

	vmovdqa %ymm0, 40(%rsp)
	vmovdqa %ymm1, 80(%rsp)

	...
	# use ymm0 and ymm1
	...

	vmovdqa 80(%rsp), %ymm1
	vmovdqa 40(%rsp), %ymm0

... without using the heavy XSAVE/XRSTOR instructions.

Note that preemption probably still needs to be disabled and possibly there are 
other details as well, but there should be no 'heavy' FPU operations.

I think this should still preserve all user-space FPU state and shouldn't muck up 
any 'weird' user-space FPU state (such as pending exceptions, legacy x87 running 
code, NaN registers or weird FPU control word settings) we might have interrupted 
either.

But I could be wrong, it should be checked whether this sequence is safe. 
Worst-case we might have to save/restore the FPU control and tag words - but those 
operations should still be much faster than a full XSAVE/XRSTOR pair.

So I do think we could do more in this area to improve driver performance, if the 
code is correct and if there's actual benchmarks that are showing real benefits.

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ