lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 20 Mar 2018 07:42:15 -0700
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Rahul Lakkireddy <rahul.lakkireddy@...lsio.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "hpa@...or.com" <hpa@...or.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
        Ganesh GR <ganeshgr@...lsio.com>,
        Nirranjan Kirubaharan <nirranjan@...lsio.com>,
        Indranil Choudhury <indranil@...lsio.com>
Subject: Re: [RFC PATCH 2/3] x86/io: implement 256-bit IO read and write

On Tue, Mar 20, 2018 at 6:32 AM, Rahul Lakkireddy
<rahul.lakkireddy@...lsio.com> wrote:
> On Monday, March 03/19/18, 2018 at 20:13:10 +0530, Thomas Gleixner wrote:
>> On Mon, 19 Mar 2018, Rahul Lakkireddy wrote:
>>
>> > Use VMOVDQU AVX CPU instruction when available to do 256-bit
>> > IO read and write.
>>
>> That's not what the patch does. See below.
>>
>> > Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@...lsio.com>
>> > Signed-off-by: Ganesh Goudar <ganeshgr@...lsio.com>
>>
>> That Signed-off-by chain is wrong....
>>
>> > +#ifdef CONFIG_AS_AVX
>> > +#include <asm/fpu/api.h>
>> > +
>> > +static inline u256 __readqq(const volatile void __iomem *addr)
>> > +{
>> > +   u256 ret;
>> > +
>> > +   kernel_fpu_begin();
>> > +   asm volatile("vmovdqu %0, %%ymm0" :
>> > +                : "m" (*(volatile u256 __force *)addr));
>> > +   asm volatile("vmovdqu %%ymm0, %0" : "=m" (ret));
>> > +   kernel_fpu_end();
>> > +   return ret;
>>
>> You _cannot_ assume that the instruction is available just because
>> CONFIG_AS_AVX is set. The availability is determined by the runtime
>> evaluated CPU feature flags, i.e. X86_FEATURE_AVX.
>>
>
> Ok.  Will add boot_cpu_has(X86_FEATURE_AVX) check as well.
>
>> Aside of that I very much doubt that this is faster than 4 consecutive
>> 64bit reads/writes as you have the full overhead of
>> kernel_fpu_begin()/end() for each access.
>>
>> You did not provide any numbers for this so its even harder to
>> determine.
>>
>
> Sorry about that.  Here are the numbers with and without this series.
>
> When reading up to 2 GB on-chip memory via MMIO, the time taken:
>
> Without Series        With Series
> (64-bit read)         (256-bit read)
>
> 52 seconds            26 seconds
>
> As can be seen, we see good improvement with doing 256-bits at a
> time.

Instead of framing this as an enhanced version of the read/write ops
why not look at replacing or extending something like the
memcpy_fromio or memcpy_toio operations? It would probably be more
comparable to what you are doing if you are wanting to move large
chunks of memory from one region to another, and it should translate
into something like AVX instructions once the CPU optimizations kick
in for a memcpy.

- Alex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ