lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wgk+upuXn7-wsDs4psxOJO4wW7G2g-Sxvv0axCibFua1w@mail.gmail.com>
Date: Wed, 2 Apr 2025 09:21:47 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: mingo@...hat.com, x86@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] x86: prevent gcc from emitting rep movsq/stosq for
 inlined ops

On Wed, 2 Apr 2025 at 06:42, Mateusz Guzik <mjguzik@...il.com> wrote:
>
>
> +ifdef CONFIG_CC_IS_GCC
> +#
> +# Inline memcpy and memset handling policy for gcc.
> +#
> +# For ops of sizes known at compilation time it quickly resorts to issuing rep
> +# movsq and stosq. On most uarchs rep-prefixed ops have a significant startup
> +# latency and it is faster to issue regular stores (even if in loops) to handle
> +# small buffers.
> +#
> +# This of course comes at an expense in terms of i-cache footprint. bloat-o-meter
> +# reported 0.23% increase for enabling these.
> +#
> +# We inline up to 256 bytes, which in the best case issues few movs, in the
> +# worst case creates a 4 * 8 store loop.
> +#
> +# The upper limit was chosen semi-arbitrarily -- uarchs wildly differ between a
> +# threshold past which a rep-prefixed op becomes faster, 256 being the lowest
> +# common denominator. Someone(tm) should revisit this from time to time.
> +#
> +KBUILD_CFLAGS += -mmemcpy-strategy=unrolled_loop:256:noalign,libcall:-1:noalign
> +KBUILD_CFLAGS += -mmemset-strategy=unrolled_loop:256:noalign,libcall:-1:noalign
> +endif

Please make this a gcc bug-report instead - I really don't want to
have random compiler-specific tuning options in the kernel.

Because that whole memcpy-strategy thing is something that gets tuned
by a lot of other compiler options (ie -march and different versions).

             Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ