[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <C74CD745-0E6D-410C-B942-416AF365B492@amacapital.net>
Date: Fri, 23 Nov 2018 11:39:29 -0700
From: Andy Lutomirski <luto@...capital.net>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: David.Laight@...lab.com, Andrew Lutomirski <luto@...nel.org>,
dvlasenk@...hat.com, Jens Axboe <axboe@...nel.dk>,
Ingo Molnar <mingo@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, bp@...en8.de,
Peter Anvin <hpa@...or.com>,
the arch/x86 maintainers <x86@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>, brgerst@...il.com,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
pabeni@...hat.com
Subject: Re: [PATCH] x86: only use ERMS for user copies for larger sizes
> On Nov 23, 2018, at 10:42 AM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
>
> On Fri, Nov 23, 2018 at 8:36 AM Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
>>
>> Let me write a generic routine in lib/iomap_copy.c (which already does
>> the "user specifies chunk size" cases), and hook it up for x86.
>
> Something like this?
>
> ENTIRELY UNTESTED! It might not compile. Seriously. And if it does
> compile, it might not work.
>
> And this doesn't actually do the memset_io() function at all, just the
> memcpy ones.
>
> Finally, it's worth noting that on x86, we have this:
>
> /*
> * override generic version in lib/iomap_copy.c
> */
> ENTRY(__iowrite32_copy)
> movl %edx,%ecx
> rep movsd
> ret
> ENDPROC(__iowrite32_copy)
>
> because back in 2006, we did this:
>
> [PATCH] Add faster __iowrite32_copy routine for x86_64
>
> This assembly version is measurably faster than the generic version in
> lib/iomap_copy.c.
>
> which actually implies that "rep movsd" is faster than doing
> __raw_writel() by hand.
>
> So it is possible that this should all be arch-specific code rather
> than that butt-ugly "generic" code I wrote in this patch.
>
> End result: I'm not really all that happy about this patch, but it's
> perhaps worth testing, and it's definitely worth discussing. Because
> our current memcpy_{to,from}io() is truly broken garbage.
>
>
What is memcpy_to_io even supposed to do? I’m guessing it’s defined as something like “copy this data to IO space using at most long-sized writes, all aligned, and writing each byte exactly once, in order.” That sounds... dubiously useful. I could see a function that writes to aligned memory in specified-sized chunks. And I can see a use for a function to just write it in whatever size chunks the architecture thinks is fastest, and *that* should probably use MOVDIR64B.
Or is there some subtlety I’m missing?
Powered by blists - more mailing lists