[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrW=hMZLUM_nPEEgUW9x9Bmhm2LZ6ZRDiRRguxVT=-_BWA@mail.gmail.com>
Date: Thu, 22 Nov 2018 10:06:58 -0800
From: Andy Lutomirski <luto@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: David Laight <David.Laight@...lab.com>,
Denys Vlasenko <dvlasenk@...hat.com>,
Jens Axboe <axboe@...nel.dk>, Ingo Molnar <mingo@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
"H. Peter Anvin" <hpa@...or.com>, X86 ML <x86@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Andrew Lutomirski <luto@...nel.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Brian Gerst <brgerst@...il.com>,
LKML <linux-kernel@...r.kernel.org>, pabeni@...hat.com
Subject: Re: [PATCH] x86: only use ERMS for user copies for larger sizes
On Thu, Nov 22, 2018 at 9:53 AM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Thu, Nov 22, 2018 at 9:36 AM David Laight <David.Laight@...lab.com> wrote:
> >
> > The other problem with the ERMS copy is that it gets used
> > for copy_to/from_io() - and the 'rep movsb' on uncached
> > locations has to do byte copies.
>
> Ugh. I thought we changed that *long* ago, because even our non-ERMS
> copy is broken for PCI (it does overlapping stores for the small tail
> cases).
>
> But looking at "memcpy_{from,to}io()", I don't see x86 overriding it
> with anything better.
>
> I suspect nobody uses those functions for anything critical any more.
> The fbcon people have their own copy functions, iirc.
>
> But we definitely should fix this. *NONE* of the regular memcpy
> functions actually work right for PCI space any more, and haven't for
> a long time.
I'm not personally volunteering, but I suspect we can do much better
than we do now:
- The new MOVDIRI and MOVDIR64B instructions can do big writes to WC
and UC memory. I assume those would be safe to use in ...toio()
functions, unless there are quirky devices out there that blow up if
their MMIO space is written in 64-byte chunks.
- MOVNTDQA can, I think, do 64-byte loads, but only from WC memory.
For sufficiently large copies, it could plausibly be faster to create
a WC alias and use MOVNTDQA than it is to copy in 8- for 16-byte
chunks. The i915 driver has a copy implementation using MOVNTDQA --
maybe this should get promoted to something in arch/x86 called
memcpy_from_wc().
--Andy
Powered by blists - more mailing lists