lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wjYrfUDOd3VYhn8HvH7MCnampXt1TdtaXo86UDrT_rbMQ@mail.gmail.com>
Date: Wed, 2 Apr 2025 11:56:31 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Andrew Cooper <andrew.cooper3@...rix.com>
Cc: mjguzik@...il.com, linux-kernel@...r.kernel.org, mingo@...hat.com, 
	x86@...nel.org, "Peter Zijlstra (Intel)" <peterz@...radead.org>
Subject: Re: [RFC PATCH] x86: prevent gcc from emitting rep movsq/stosq for
 inlined ops

On Wed, 2 Apr 2025 at 11:40, Andrew Cooper <andrew.cooper3@...rix.com> wrote:
>
> You still want the compiler to be able to do a first-pass optimisation
> over __builtin_mem*(), for elimination/merging/etc

Absolutely.

That's literally why the kernel ends up using __builtin_memcpy() in
the first place - so that the compiler can turn the (very common)
small fixed-size cases into just regular moves.

The function call would only be for actual big moves and/or unknown sizes.

That said, it can be advantageous to have more than one function call.

For the user copy case (where we control the horizontal and the
vertical in the kernel) at one point I did have a patch that did
something fancier than just a single call-out.

Because even when the compiler doesn't know the exact length, it's
often known that it's a multiple of four, for example. It's not all
that uncommon to see code like this:

        memcpy(image->segment, segments, nr_segments * sizeof(*segments));

and an even more common case is "I'm doing an out-of-line call not
because the size is unknown, but because the size is big".

So I noted the size alignment of the size in the function name, so
that I could have different versions for the most obvious
straight-forward cases (notably the "just do word-at-a-time copies").

Admittedly my main use case for that was the big fixed-size case in
the stat code, which just does

        return copy_to_user(statbuf,&tmp,sizeof(tmp)) ? -EFAULT : 0;

and shows up like a sore thumb on some benchmarks.

But I've always hated the patch because it turns out that the real fix
is to get rid of the temporary buffer on the stack, and just copy the
fields one-by-one with 'unsafe_put_user()' and friends, but every time
I do that - and I've done it several times - I end up throwing the
patch away because it ends up being problematic on non-x86
architectures

(Reason: INIT_STRUCT_STAT64_PADDING(). Ugh).

              Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ