lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180320105427.bm4od7cpessbraag@gmail.com>
Date:   Tue, 20 Mar 2018 11:54:27 +0100
From:   Ingo Molnar <mingo@...nel.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     David Laight <David.Laight@...LAB.COM>,
        'Rahul Lakkireddy' <rahul.lakkireddy@...lsio.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "hpa@...or.com" <hpa@...or.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
        "ganeshgr@...lsio.com" <ganeshgr@...lsio.com>,
        "nirranjan@...lsio.com" <nirranjan@...lsio.com>,
        "indranil@...lsio.com" <indranil@...lsio.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Fenghua Yu <fenghua.yu@...el.com>,
        Eric Biggers <ebiggers3@...il.com>
Subject: Re: [RFC PATCH 0/3] kernel: add support for 256-bit IO access


* Thomas Gleixner <tglx@...utronix.de> wrote:

> On Tue, 20 Mar 2018, Ingo Molnar wrote:
> > * Thomas Gleixner <tglx@...utronix.de> wrote:
> > 
> > > > So I do think we could do more in this area to improve driver performance, if the 
> > > > code is correct and if there's actual benchmarks that are showing real benefits.
> > > 
> > > If it's about hotpath performance I'm all for it, but the use case here is
> > > a debug facility...
> > > 
> > > And if we go down that road then we want a AVX based memcpy()
> > > implementation which is runtime conditional on the feature bit(s) and
> > > length dependent. Just slapping a readqq() at it and use it in a loop does
> > > not make any sense.
> > 
> > Yeah, so generic memcpy() replacement is only feasible I think if the most 
> > optimistic implementation is actually correct:
> > 
> >  - if no preempt disable()/enable() is required
> > 
> >  - if direct access to the AVX[2] registers does not disturb legacy FPU state in 
> >    any fashion
> > 
> >  - if direct access to the AVX[2] registers cannot raise weird exceptions or have
> >    weird behavior if the FPU control word is modified to non-standard values by 
> >    untrusted user-space
> > 
> > If we have to touch the FPU tag or control words then it's probably only good for 
> > a specialized API.
> 
> I did not mean to have a general memcpy replacement. Rather something like
> magic_memcpy() which falls back to memcpy when AVX is not usable or the
> length does not justify the AVX stuff at all.

OK, fair enough.

Note that a generic version might still be worth trying out, if and only if it's 
safe to access those vector registers directly: modern x86 CPUs will do their 
non-constant memcpy()s via the common memcpy_erms() function - which could in 
theory be an easy common point to be (cpufeatures-) patched to an AVX2 variant, if 
size (and alignment, perhaps) is a multiple of 32 bytes or so.

Assuming it's correct with arbitrary user-space FPU state and if it results in any 
measurable speedups, which might not be the case: ERMS is supposed to be very 
fast.

So even if it's possible (which it might not be), it could end up being slower 
than the ERMS version.

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ