[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <afebfe59fadb46fbbd4b09eaf0798592@AcuMS.aculab.com>
Date: Mon, 26 Nov 2018 10:12:23 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Andy Lutomirski' <luto@...capital.net>,
Linus Torvalds <torvalds@...ux-foundation.org>
CC: Andrew Lutomirski <luto@...nel.org>,
"dvlasenk@...hat.com" <dvlasenk@...hat.com>,
Jens Axboe <axboe@...nel.dk>, Ingo Molnar <mingo@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>,
Peter Anvin <hpa@...or.com>,
the arch/x86 maintainers <x86@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
"brgerst@...il.com" <brgerst@...il.com>,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
"pabeni@...hat.com" <pabeni@...hat.com>
Subject: RE: [PATCH] x86: only use ERMS for user copies for larger sizes
From: Andy Lutomirski
> Sent: 23 November 2018 19:11
> > On Nov 23, 2018, at 11:44 AM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> >
> >> On Fri, Nov 23, 2018 at 10:39 AM Andy Lutomirski <luto@...capital.net> wrote:
> >>
> >> What is memcpy_to_io even supposed to do? I’m guessing it’s defined as
> >> something like “copy this data to IO space using at most long-sized writes,
> >> all aligned, and writing each byte exactly once, in order.”
> >> That sounds... dubiously useful.
> >
> > We've got hundreds of users of it, so it's fairly common..
>
> I’m wondering if the “at most long-sizes” restriction matters, especially
> given that we’re apparently accessing some of the same bytes more than once.
> I would believe that trying to encourage 16-byte writes (with AVX, ugh) or
> 64-byte writes (with MOVDIR64B) would be safe and could meaningfully speed
> up some workloads.
The real gains come from increasing the width of IO reads, not IO writes.
None of the x86 cpus I've got issue multiple concurrent PCIe reads
(the PCIe completion tag seems to match the core number).
PCIe writes are all 'posted' so there aren't big gaps between them.
> >> I could see a function that writes to aligned memory in specified-sized chunks.
> >
> > We have that. It's called "__iowrite{32,64}_copy()". It has very few users.
For x86 you want separate entry points for the 'rep movq' copy
and one using an instruction loop.
(Perhaps with guidance to the cutover length.)
In most places the driver will know whether the size is above or below
the cutover - which might be 256.
Certainly transfers below 64 bytes are 'short'.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists