linux-kernel - Re: [PATCH v3] uapi: Revert "bitops: avoid integer overflow in GENMASK(

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z765-uSYQb986RTZ@vaxr-BM6660-BM6360>
Date: Wed, 26 Feb 2025 14:51:38 +0800
From: I Hsin Cheng <richard120310@...il.com>
To: yury.norov@...il.com
Cc: anshuman.khandual@....com, arnd@...db.de, linux-kernel@...r.kernel.org,
	jserv@...s.ncku.edu.tw, skhan@...uxfoundation.org, mka@...omium.org,
	akpm@...ux-foundation.org
Subject: Re: [PATCH v3] uapi: Revert "bitops: avoid integer overflow in
 GENMASK(_ULL)"

On Wed, Feb 26, 2025 at 12:54:37AM +0800, I Hsin Cheng wrote:
> This patch reverts 'commit c32ee3d9abd2("bitops: avoid integer overflow in
>  GENMASK(_ULL)")'.
> 
> The code generation can be shrink by over 1KB by reverting this commit.
> Originally the commit claimed that clang would emit warnings using the
> implementation at that time.
> 
> The patch was applied and tested against numerous compilers, inclduing
> gcc-13, gcc-12, gcc-11 cross-compiler, clang-17, clang-18 and clang-19.
> Various warning levels were set (-W=0, -W=1, -W=2) and CONFIG_WERROR
> disabled to complete the compilation. The results show that no compilation
> errors or warnings were generated due to the patch.
> 
> The results of code size reduction are summarized in the following table.
> The code size changes for clang are all zero across different versions,
> so they're not listed in the table.
> 
> For NR_CPUS=64 on x86_64.
> ----------------------------------------------
> |	        |   gcc-13 |   gcc-12 |   gcc-11 |
> ----------------------------------------------
> |       old | 22438085 | 22453915 | 22302033 |
> ----------------------------------------------
> |       new | 22436816 | 22452913 | 22300826 |
> ----------------------------------------------
> | new - old |    -1269 |    -1002 |    -1207 |
> ----------------------------------------------
> 
> For NR_CPUS=1024 on x86_64.
> ----------------------------------------------
> |	        |   gcc-13 |   gcc-12 |   gcc-11 |
> ----------------------------------------------
> |       old | 22493682 | 22509812 | 22357661 |
> ----------------------------------------------
> |       new | 22493230 | 22509487 | 22357250 |
> ----------------------------------------------
> | new - old |     -452 |     -325 |     -411 |
> ----------------------------------------------
> 
> For arm64 architecture, gcc cross-compiler was used and QEMU was
> utilized to execute a VM for a CPU-heavy workload to ensure no
> side effects and that functionalities remained correct. The test
> even demonstrated a positive result in terms of code size reduction:
> * Before: 31660668
> * After: 31658724
> * Difference (After - Before): -1944
> 
> An analysis of multiple functions compiled with gcc-13 on x86_64 was
> performed. In summary, the patch elimates one negation in almost
> every use case. However, negative effects may occur in some cases,
> such as the generation of additional "mov" instruction or increased
> register usage> The use of "~_UL(0) << (l)" may even result in the
> allocations of "%r*" registers instead of "%e*" registers (which are
> 32-bit registers) because the compiler cannot assume that the higher
> bits are zero.
> 
> Signed-off-by: I Hsin Cheng <richard120310@...il.com>
> ---
> Changelog:
> 
> v1 -> v2:
> 	- Refer the patch explicitly as a revert of commit c32ee3d
> 	- Squash commits into 1 commit
> 	- Perform compilation for numerous compilers, architectures and
> 	  compare code size shrink for each of them
> 	- Perform cpu-heavy workload test inside x86_64 VM and ARM64 VM
> 	- Analyze the result of disassembly of several small use cases
> 
> v2 -> v3:
> 	- Refactor code to single line
> 	- Disabled CONFIG_WERROR to ensure no additional compilation errors or warnings
> 	- Remove description for unrelated compilation errors and warnings
> 	- Summarize the result into a better looking table
> 	- Rephrase the patch and fix typos
> 
> 
> Hi Yury,
> 
> Sorry about the last iteration that I included everything, making
> the email too large and difficult to read. However, you still reviewed
> it and gave me feedbacks, really big thanks for your patience and those
> feedbacks. Running these tests also gave me a great opportunity to learn
> a lot.
> 
> If there's anything else needed to be test or modified, please let
> me know, I'll ammend them as soon as possible.
> 
> Hi David,
> 
> Thanks for your advise on alternative ideas for this code. I ran some
> simple test (basically check the result of code size reduction) based
> on your suggestions.
> 
> For gcc-13 on x86_64 + defconfig.
> * "(_UL(2) << (h)) - (_UL(1) << (l))"
> 	Before=22438085, After=22438193, Difference ( After - Before ) = 108
> * "((_UL(1) + _UL(1)) << (h)) - (_UL(1) << (l))"
> 	Before=22438085, After=22438209, Difference ( After - Before ) = 124
> 
> I tried to do an analysis on "intel_arch_events_quirk()", it only +2 in
> code size change, I think it would be a nice example to see the differences
> in generated code.
> 
> So the result shows that your proposal can save 1 negation and 1 "and".
> -ffffffff83278ad2:	48 f7 d8             	neg    %rax
> -ffffffff83278adc:	83 e0 7f             	and    $0x7f,%eax
> 
> However, one more "mov" and one more "sub" are generated.
> +ffffffff83278acf:	b8 80 00 00 00       	mov    $0x80,%eax
> +ffffffff83278ad7:	48 29 d0             	sub    %rdx,%rax
> 
> No change in total number of instructions, but negation only requires
> one register, and the "mov" generated is more expensive then usual.
> (Usually "mov" of the following form are generated,
> "48 89 ea             	mov    %rbp,%rdx",
> "89 c3                	mov    %eax,%ebx" ).
> 
> Thanks for your advise again, in some scenario it does have positive
> effect, but unfortunately, the overall impact is not beneficial.
> 
> Best regards,
> I Hsin Cheng.
> 
> Tests performed on ubuntu 24.04 with AMD Ryzen 7 5700X3D 8-Core
> Processor on x86_64 with kernel version v6.14.0-rc1.
> ---
>  include/uapi/linux/bits.h | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/include/uapi/linux/bits.h b/include/uapi/linux/bits.h
> index 5ee30f882736..682b406e1067 100644
> --- a/include/uapi/linux/bits.h
> +++ b/include/uapi/linux/bits.h
> @@ -4,13 +4,9 @@
>  #ifndef _UAPI_LINUX_BITS_H
>  #define _UAPI_LINUX_BITS_H
>  
> -#define __GENMASK(h, l) \
> -        (((~_UL(0)) - (_UL(1) << (l)) + 1) & \
> -         (~_UL(0) >> (__BITS_PER_LONG - 1 - (h))))
> +#define __GENMASK(h, l) (((~_UL(0)) << (l)) & (~_UL(0) >> (BITS_PER_LONG - 1 - (h))))
>  
> -#define __GENMASK_ULL(h, l) \
> -        (((~_ULL(0)) - (_ULL(1) << (l)) + 1) & \
> -         (~_ULL(0) >> (__BITS_PER_LONG_LONG - 1 - (h))))
> +#define __GENMASK_ULL(h, l) (((~_ULL(0)) << (l)) & (~_ULL(0) >> (BITS_PER_LONG_LONG - 1 - (h))))
>  
>  #define __GENMASK_U128(h, l) \
>  	((_BIT128((h)) << 1) - (_BIT128(l)))
> -- 
> 2.43.0
>

Please ignore this one, I missed to cc to David.
I'll resend again later.

Beset regards,
I Hsin Cheng