lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250306203714.118ead69@pumpkin>
Date: Thu, 6 Mar 2025 20:37:14 +0000
From: David Laight <david.laight.linux@...il.com>
To: Ingo Molnar <mingo@...nel.org>
Cc: Uros Bizjak <ubizjak@...il.com>, Peter Zijlstra <peterz@...radead.org>,
 Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...el.com>,
 x86@...nel.org, linux-kernel@...r.kernel.org, Thomas Gleixner
 <tglx@...utronix.de>, Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter
 Anvin" <hpa@...or.com>, Linus Torvalds <torvalds@...uxfoundation.org>,
 Linus Torvalds <torvalds@...ux-foundation.org>, Arnd Bergmann
 <arnd@...db.de>
Subject: Re: kernel: Current status of CONFIG_CC_OPTIMIZE_FOR_SIZE=y (was:
 Re: [PATCH -tip] x86/locking/atomic: Use asm_inline for atomic locking
 insns)

On Thu, 6 Mar 2025 10:43:26 +0100
Ingo Molnar <mingo@...nel.org> wrote:

> * Uros Bizjak <ubizjak@...il.com> wrote:
...
> And this one by Linus, 14 years ago:
> 
>   =================>  
>   281dc5c5ec0f ("Give up on pushing CC_OPTIMIZE_FOR_SIZE")
>   =================>  
> 
>   From: Linus Torvalds <torvalds@...ux-foundation.org>
>   Date: Sun, 22 May 2011 14:30:36 -0700
>   Subject: [PATCH] Give up on pushing CC_OPTIMIZE_FOR_SIZE
> 
>     I still happen to believe that I$ miss costs are a major thing, but
>     sadly, -Os doesn't seem to be the solution.  With or without it, gcc
>     will miss some obvious code size improvements, and with it enabled gcc
>     will sometimes make choices that aren't good even with high I$ miss
>     ratios.
> 
>     For example, with -Os, gcc on x86 will turn a 20-byte constant memcpy
>     into a "rep movsl".  While I sincerely hope that x86 CPU's will some day
>     do a good job at that, they certainly don't do it yet, and the cost is
>     higher than a L1 I$ miss would be.

Well 'rep movsb' is a lot better than it was then.
Even on Sandy bridge (IIRC) it is ~20 clocks for short transfers (of any length).
Unlike the P4 with a 140 clock overhead!
Still slower for short fixed sizes, but probably good for anything variable
because of the costs of the function call and the conditionals to select the
'best' algorithm.
OTOH if you know it is only a few bytes a code loop may be best - and gcc will
convert it to a memcpy() call for you!

The really silly one was 'push immd_byte; pop reg' to get a sign extended value.

But I do remember -O2 being smaller than -Oz !
Just changing the inlining thresholds and code replication on loops
(and never unrollong loops) would probably be a good start.

	David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ