linux-kernel - Re: [PATCH] x86: only use ERMS for user copies for larger sizes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <C74CD745-0E6D-410C-B942-416AF365B492@amacapital.net>
Date:   Fri, 23 Nov 2018 11:39:29 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     David.Laight@...lab.com, Andrew Lutomirski <luto@...nel.org>,
        dvlasenk@...hat.com, Jens Axboe <axboe@...nel.dk>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, bp@...en8.de,
        Peter Anvin <hpa@...or.com>,
        the arch/x86 maintainers <x86@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>, brgerst@...il.com,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        pabeni@...hat.com
Subject: Re: [PATCH] x86: only use ERMS for user copies for larger sizes



> On Nov 23, 2018, at 10:42 AM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> 
> On Fri, Nov 23, 2018 at 8:36 AM Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
>> 
>> Let me write a generic routine in lib/iomap_copy.c (which already does
>> the "user specifies chunk size" cases), and hook it up for x86.
> 
> Something like this?
> 
> ENTIRELY UNTESTED! It might not compile. Seriously. And if it does
> compile, it might not work.
> 
> And this doesn't actually do the memset_io() function at all, just the
> memcpy ones.
> 
> Finally, it's worth noting that on x86, we have this:
> 
>  /*
>   * override generic version in lib/iomap_copy.c
>   */
>  ENTRY(__iowrite32_copy)
>          movl %edx,%ecx
>          rep movsd
>          ret
>  ENDPROC(__iowrite32_copy)
> 
> because back in 2006, we did this:
> 
>    [PATCH] Add faster __iowrite32_copy routine for x86_64
> 
>    This assembly version is measurably faster than the generic version in
>    lib/iomap_copy.c.
> 
> which actually implies that "rep movsd" is faster than doing
> __raw_writel() by hand.
> 
> So it is possible that this should all be arch-specific code rather
> than that butt-ugly "generic" code I wrote in this patch.
> 
> End result: I'm not really all that  happy about this patch, but it's
> perhaps worth testing, and it's definitely worth discussing. Because
> our current memcpy_{to,from}io() is truly broken garbage.
> 
>                   

What is memcpy_to_io even supposed to do?  I’m guessing it’s defined as something like “copy this data to IO space using at most long-sized writes, all aligned, and writing each byte exactly once, in order.”  That sounds... dubiously useful.  I could see a function that writes to aligned memory in specified-sized chunks.  And I can see a use for a function to just write it in whatever size chunks the architecture thinks is fastest, and *that* should probably use MOVDIR64B.

Or is there some subtlety I’m missing?