lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250226065623.1567363-1-richard120310@gmail.com>
Date: Wed, 26 Feb 2025 14:56:23 +0800
From: I Hsin Cheng <richard120310@...il.com>
To: yury.norov@...il.com
Cc: david.laight.linux@...il.com,
	anshuman.khandual@....com,
	arnd@...db.de,
	linux-kernel@...r.kernel.org,
	jserv@...s.ncku.edu.tw,
	skhan@...uxfoundation.org,
	mka@...omium.org,
	akpm@...ux-foundation.org,
	I Hsin Cheng <richard120310@...il.com>
Subject: [PATCH v3 RESEND] uapi: Revert "bitops: avoid integer overflow in GENMASK(_ULL)"

This patch reverts 'commit c32ee3d9abd2("bitops: avoid integer overflow in
 GENMASK(_ULL)")'.

The code generation can be shrink by over 1KB by reverting this commit.
Originally the commit claimed that clang would emit warnings using the
implementation at that time.

The patch was applied and tested against numerous compilers, including
gcc-13, gcc-12, gcc-11 cross-compiler, clang-17, clang-18 and clang-19.
Various warning levels were set (-W=0, -W=1, -W=2) and CONFIG_WERROR
disabled to complete the compilation. The results show that no compilation
errors or warnings were generated due to the patch.

The results of code size reduction are summarized in the following table.
The code size changes for clang are all zero across different versions,
so they're not listed in the table.

For NR_CPUS=64 on x86_64.
----------------------------------------------
|	        |   gcc-13 |   gcc-12 |   gcc-11 |
----------------------------------------------
|       old | 22438085 | 22453915 | 22302033 |
----------------------------------------------
|       new | 22436816 | 22452913 | 22300826 |
----------------------------------------------
| new - old |    -1269 |    -1002 |    -1207 |
----------------------------------------------

For NR_CPUS=1024 on x86_64.
----------------------------------------------
|	        |   gcc-13 |   gcc-12 |   gcc-11 |
----------------------------------------------
|       old | 22493682 | 22509812 | 22357661 |
----------------------------------------------
|       new | 22493230 | 22509487 | 22357250 |
----------------------------------------------
| new - old |     -452 |     -325 |     -411 |
----------------------------------------------

For arm64 architecture, gcc cross-compiler was used and QEMU was
utilized to execute a VM for a CPU-heavy workload to ensure no
side effects and that functionalities remained correct. The test
even demonstrated a positive result in terms of code size reduction:
* Before: 31660668
* After: 31658724
* Difference (After - Before): -1944

An analysis of multiple functions compiled with gcc-13 on x86_64 was
performed. In summary, the patch elimates one negation in almost
every use case. However, negative effects may occur in some cases,
such as the generation of additional "mov" instruction or increased
register usage. The use of "~_UL(0) << (l)" may even result in the
allocations of "%r*" registers instead of "%e*" registers (which are
32-bit registers) because the compiler cannot assume that the higher
bits are zero.

Signed-off-by: I Hsin Cheng <richard120310@...il.com>
---
Changelog:

v1 -> v2:
	- Refer the patch explicitly as a revert of commit c32ee3d
	- Squash commits into 1 commit
	- Perform compilation for numerous compilers, architectures and
	  compare code size shrink for each of them
	- Perform cpu-heavy workload test inside x86_64 VM and ARM64 VM
	- Analyze the result of disassembly of several small use cases

v2 -> v3:
	- Refactor code to single line
	- Disabled CONFIG_WERROR to ensure no additional compilation errors or warnings
	- Remove description for unrelated compilation errors and warnings
	- Summarize the result into a better looking table
	- Rephrase the patch and fix typos


Hi Yury,

Sorry about the last iteration that I included everything, making
the email too large and difficult to read. However, you still reviewed
it and gave me feedbacks, really big thanks for your patience and those
feedbacks. Running these tests also gave me a great opportunity to learn
a lot.

If there's anything else needed to be test or modified, please let
me know, I'll ammend them as soon as possible.

Hi David,

Thanks for your advise on alternative ideas for this code. I ran some
simple test (basically check the result of code size reduction) based
on your suggestions.

For gcc-13 on x86_64 + defconfig.
* "(_UL(2) << (h)) - (_UL(1) << (l))"
	Before=22438085, After=22438193, Difference ( After - Before ) = 108
* "((_UL(1) + _UL(1)) << (h)) - (_UL(1) << (l))"
	Before=22438085, After=22438209, Difference ( After - Before ) = 124

I tried to do an analysis on "intel_arch_events_quirk()", it only +2 in
code size change, I think it would be a nice example to see the differences
in generated code.

So the result shows that your proposal can save 1 negation and 1 "and".
-ffffffff83278ad2:	48 f7 d8             	neg    %rax
-ffffffff83278adc:	83 e0 7f             	and    $0x7f,%eax

However, one more "mov" and one more "sub" are generated.
+ffffffff83278acf:	b8 80 00 00 00       	mov    $0x80,%eax
+ffffffff83278ad7:	48 29 d0             	sub    %rdx,%rax

No change in total number of instructions, but negation only requires
one register, and the "mov" generated is more expensive then usual.
(Usually "mov" of the following form are generated,
"48 89 ea             	mov    %rbp,%rdx",
"89 c3                	mov    %eax,%ebx" ).

Thanks for your advise again, in some scenario it does have positive
effect, but unfortunately, the overall impact is not beneficial.

Best regards,
I Hsin Cheng.

Tests performed on ubuntu 24.04 with AMD Ryzen 7 5700X3D 8-Core
Processor on x86_64 with kernel version v6.14.0-rc1.
---
 include/uapi/linux/bits.h | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/bits.h b/include/uapi/linux/bits.h
index 5ee30f882736..682b406e1067 100644
--- a/include/uapi/linux/bits.h
+++ b/include/uapi/linux/bits.h
@@ -4,13 +4,9 @@
 #ifndef _UAPI_LINUX_BITS_H
 #define _UAPI_LINUX_BITS_H
 
-#define __GENMASK(h, l) \
-        (((~_UL(0)) - (_UL(1) << (l)) + 1) & \
-         (~_UL(0) >> (__BITS_PER_LONG - 1 - (h))))
+#define __GENMASK(h, l) (((~_UL(0)) << (l)) & (~_UL(0) >> (BITS_PER_LONG - 1 - (h))))
 
-#define __GENMASK_ULL(h, l) \
-        (((~_ULL(0)) - (_ULL(1) << (l)) + 1) & \
-         (~_ULL(0) >> (__BITS_PER_LONG_LONG - 1 - (h))))
+#define __GENMASK_ULL(h, l) (((~_ULL(0)) << (l)) & (~_ULL(0) >> (BITS_PER_LONG_LONG - 1 - (h))))
 
 #define __GENMASK_U128(h, l) \
 	((_BIT128((h)) << 1) - (_BIT128(l)))
-- 
2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ