[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250322223744.353bf74f@pumpkin>
Date: Sat, 22 Mar 2025 22:37:44 +0000
From: David Laight <david.laight.linux@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: linux-kernel@...r.kernel.org, Jens Axboe <axboe@...nel.dk>, David
Howells <dhowells@...hat.com>, Matthew Wilcox <willy@...radead.org>, Andrew
Morton <akpm@...ux-foundation.org>, Alexander Viro
<viro@...iv.linux.org.uk>
Subject: Re: [PATCH next 0/3] iov: Optimise user copies
On Fri, 21 Mar 2025 16:35:52 -0700
Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> On Fri, 21 Mar 2025 at 15:46, David Laight <david.laight.linux@...il.com> wrote:
> >
> > The speculation barrier in access_ok() is expensive.
> >
> > The first patch removes the initial checks when reading the iovec[].
> > The checks are repeated before the actual copy.
> >
> > The second patch uses 'user address masking' if supported.
> >
> > The third removes a lot of code for single entry iovec[].
>
> Ack, except I'd really like to see numbers for things that claim to
> remove expensive stuff.
Except that some of the 'expensive stuff' is missing!
copy_from_user_iter() does:
if (access_ok())
raw_copy_from_user();
So it is missing the barrier_nospec().
The error handling is also different from _inline_copy_from_user().
(It doesn't zero-fill after a partial read.)
The observant will also notice that it is missing the massive
performance hit (and code bloat) of check_copy_size() (usercopy hardening).
Talking of performance I've dug out my clock cycle measuring code
(still full of different ipcsum functions).
I'm sure I got 12 bytes/clock on my i7-7 for the loop in the current kernel,
but it is only giving 10 today (possibly I don't have the latest version).
OTOH my new zen5 runs the adxo/adxc loop at 16 bytes/clock (i7-7 manages 12).
I'm going to try to find time for some memcpy() experiments.
David
>
> But yeah, the patches look sane.
>
> Linus
Powered by blists - more mailing lists