lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 26 Nov 2018 10:12:23 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Andy Lutomirski' <luto@...capital.net>,
        Linus Torvalds <torvalds@...ux-foundation.org>
CC:     Andrew Lutomirski <luto@...nel.org>,
        "dvlasenk@...hat.com" <dvlasenk@...hat.com>,
        Jens Axboe <axboe@...nel.dk>, Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>,
        Peter Anvin <hpa@...or.com>,
        the arch/x86 maintainers <x86@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        "brgerst@...il.com" <brgerst@...il.com>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        "pabeni@...hat.com" <pabeni@...hat.com>
Subject: RE: [PATCH] x86: only use ERMS for user copies for larger sizes

From: Andy Lutomirski
> Sent: 23 November 2018 19:11
> > On Nov 23, 2018, at 11:44 AM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> >
> >> On Fri, Nov 23, 2018 at 10:39 AM Andy Lutomirski <luto@...capital.net> wrote:
> >>
> >> What is memcpy_to_io even supposed to do?  I’m guessing it’s defined as
> >> something like “copy this data to IO space using at most long-sized writes,
> >> all aligned, and writing each byte exactly once, in order.”
> >> That sounds... dubiously useful.
> >
> > We've got hundreds of users of it, so it's fairly common..
> 
> I’m wondering if the “at most long-sizes” restriction matters, especially
> given that we’re apparently accessing some of the same bytes more than once.
> I would believe that trying to encourage 16-byte writes (with AVX, ugh) or
> 64-byte writes (with MOVDIR64B) would be safe and could meaningfully speed
> up some workloads.

The real gains come from increasing the width of IO reads, not IO writes.
None of the x86 cpus I've got issue multiple concurrent PCIe reads
(the PCIe completion tag seems to match the core number).
PCIe writes are all 'posted' so there aren't big gaps between them.

> >> I could see a function that writes to aligned memory in specified-sized chunks.
> >
> > We have that. It's called "__iowrite{32,64}_copy()". It has very few users.

For x86 you want separate entry points for the 'rep movq' copy
and one using an instruction loop.
(Perhaps with guidance to the cutover length.)
In most places the driver will know whether the size is above or below
the cutover - which might be 256.
Certainly transfers below 64 bytes are 'short'.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ