lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <6c363c6f-7152-4d09-96db-861eda759a35@app.fastmail.com>
Date: Mon, 28 Apr 2025 13:21:11 +0200
From: "Arnd Bergmann" <arnd@...nel.org>
To: "Ingo Molnar" <mingo@...nel.org>
Cc: "H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org,
 "Ahmed S . Darwish" <darwi@...utronix.de>,
 "Andrew Cooper" <andrew.cooper3@...rix.com>,
 "Ard Biesheuvel" <ardb@...nel.org>, "Borislav Petkov" <bp@...en8.de>,
 "Dave Hansen" <dave.hansen@...ux.intel.com>,
 "John Ogness" <john.ogness@...utronix.de>,
 "Linus Torvalds" <torvalds@...ux-foundation.org>,
 "Peter Zijlstra" <peterz@...radead.org>,
 "Thomas Gleixner" <tglx@...utronix.de>
Subject: Re: [PATCH 13/15] x86/cpu: Make CONFIG_X86_CX8 unconditional

On Mon, Apr 28, 2025, at 11:16, Ingo Molnar wrote:
> * Arnd Bergmann <arnd@...nel.org> wrote:
>> 
>> b) always build with -march=i586 and leave only the -mtune
>>    flags; see if anyone cares enough to even benchmark
>>    and pick one of the other options if they can show
>>    a meaningful regression over -march=i686 -mtune=
>
> That's actually a good idea IMO. I looked at the code generation with 
> current compilers and it turns out that M686 is *substantially* worse 
> in code generation than M586, as apparently the extra CMOV instructions 
> bloat up the generated code:
>
>       text	   data	    bss	    dec	    hex	filename
>   15427023	7601010	1744896	24772929	17a0141	vmlinux.M586
>   16578295	7598826	1744896	25922017	18b89e1	vmlinux.M686
>
>  - +7.5% increase in text size (5.6% according to bloatometer),
>  - +2% increase in instruction count,
>  - while number of branches increases by +1.3%.
>
> But it's not about CMOV: I checked about a dozen functions that end up 
> using CMOV, and the 'conditional' part of CMOV does seem to reduce 
> branches for those functions by a minor degree and ends up reducing 
> their size as well. So CMOV helps, a bit.
>
> The substantial code bloat comes from some other aspect of GCC's 
> march=i686 flag ... I bet it's primarily inlining: there's a 0.7% 
> reduction in number of calls done.

I had tried the same thing already, but saw a different result,
For me, the i686 output is 0.2% smaller than the i586 one (both
-mtune=generic), using gcc-14.2. or just 0.1% with clang-21,
which is roughly what I expected:

   text	   data	    bss	    dec	    hex	filename
7454055	4158218	1695744	13308017	 cb1071	build/tmp/vmlinux-i586
7433427	4154146	1695744	13283317	 caaff5	build/tmp/vmlinux-i686
7318514	4052573	1687552	13058639	 c7424f	build/tmp/vmlinux-i586-clang
7309938	4052573	1687552	13050063	 c720cf	build/tmp/vmlinux-i686-clang

I do see a larger difference compared to other -mtune= options, here is
the same config with "clang-21 -march=i586 -mtune=i686" instead of
"-march=i586 -mtune=generic":

7254510	4056669	1687552	12998731	 c6584b	build/tmp/vmlinux

There is a good chance that the -mtune= optimizations totally
dwarf cmov not just in code size difference but also actual
performance, the bit I'm unsure about is whether we still need
to worry about any core where this is not the case (I'm guessing
not but have no way to prove that).

      Arnd

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ