lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250322223744.353bf74f@pumpkin>
Date: Sat, 22 Mar 2025 22:37:44 +0000
From: David Laight <david.laight.linux@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: linux-kernel@...r.kernel.org, Jens Axboe <axboe@...nel.dk>, David
 Howells <dhowells@...hat.com>, Matthew Wilcox <willy@...radead.org>, Andrew
 Morton <akpm@...ux-foundation.org>, Alexander Viro
 <viro@...iv.linux.org.uk>
Subject: Re: [PATCH next 0/3] iov: Optimise user copies

On Fri, 21 Mar 2025 16:35:52 -0700
Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> On Fri, 21 Mar 2025 at 15:46, David Laight <david.laight.linux@...il.com> wrote:
> >
> > The speculation barrier in access_ok() is expensive.
> >
> > The first patch removes the initial checks when reading the iovec[].
> > The checks are repeated before the actual copy.
> >
> > The second patch uses 'user address masking' if supported.
> >
> > The third removes a lot of code for single entry iovec[].  
> 
> Ack, except I'd really like to see numbers for things that claim to
> remove expensive stuff.

Except that some of the 'expensive stuff' is missing!

copy_from_user_iter() does:
	if (access_ok())
		raw_copy_from_user();
So it is missing the barrier_nospec().
The error handling is also different from _inline_copy_from_user().
(It doesn't zero-fill after a partial read.)

The observant will also notice that it is missing the massive
performance hit (and code bloat) of check_copy_size() (usercopy hardening).

Talking of performance I've dug out my clock cycle measuring code
(still full of different ipcsum functions).
I'm sure I got 12 bytes/clock on my i7-7 for the loop in the current kernel,
but it is only giving 10 today (possibly I don't have the latest version).
OTOH my new zen5 runs the adxo/adxc loop at 16 bytes/clock (i7-7 manages 12).
I'm going to try to find time for some memcpy() experiments.

	David

> 
> But yeah, the patches look sane.
> 
>           Linus


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ