[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250225165437.1554395-1-richard120310@gmail.com>
Date: Wed, 26 Feb 2025 00:54:37 +0800
From: I Hsin Cheng <richard120310@...il.com>
To: yury.norov@...il.com
Cc: anshuman.khandual@....com,
arnd@...db.de,
linux-kernel@...r.kernel.org,
jserv@...s.ncku.edu.tw,
skhan@...uxfoundation.org,
mka@...omium.org,
akpm@...ux-foundation.org,
I Hsin Cheng <richard120310@...il.com>
Subject: [PATCH v3] uapi: Revert "bitops: avoid integer overflow in GENMASK(_ULL)"
This patch reverts 'commit c32ee3d9abd2("bitops: avoid integer overflow in
GENMASK(_ULL)")'.
The code generation can be shrink by over 1KB by reverting this commit.
Originally the commit claimed that clang would emit warnings using the
implementation at that time.
The patch was applied and tested against numerous compilers, inclduing
gcc-13, gcc-12, gcc-11 cross-compiler, clang-17, clang-18 and clang-19.
Various warning levels were set (-W=0, -W=1, -W=2) and CONFIG_WERROR
disabled to complete the compilation. The results show that no compilation
errors or warnings were generated due to the patch.
The results of code size reduction are summarized in the following table.
The code size changes for clang are all zero across different versions,
so they're not listed in the table.
For NR_CPUS=64 on x86_64.
----------------------------------------------
| | gcc-13 | gcc-12 | gcc-11 |
----------------------------------------------
| old | 22438085 | 22453915 | 22302033 |
----------------------------------------------
| new | 22436816 | 22452913 | 22300826 |
----------------------------------------------
| new - old | -1269 | -1002 | -1207 |
----------------------------------------------
For NR_CPUS=1024 on x86_64.
----------------------------------------------
| | gcc-13 | gcc-12 | gcc-11 |
----------------------------------------------
| old | 22493682 | 22509812 | 22357661 |
----------------------------------------------
| new | 22493230 | 22509487 | 22357250 |
----------------------------------------------
| new - old | -452 | -325 | -411 |
----------------------------------------------
For arm64 architecture, gcc cross-compiler was used and QEMU was
utilized to execute a VM for a CPU-heavy workload to ensure no
side effects and that functionalities remained correct. The test
even demonstrated a positive result in terms of code size reduction:
* Before: 31660668
* After: 31658724
* Difference (After - Before): -1944
An analysis of multiple functions compiled with gcc-13 on x86_64 was
performed. In summary, the patch elimates one negation in almost
every use case. However, negative effects may occur in some cases,
such as the generation of additional "mov" instruction or increased
register usage> The use of "~_UL(0) << (l)" may even result in the
allocations of "%r*" registers instead of "%e*" registers (which are
32-bit registers) because the compiler cannot assume that the higher
bits are zero.
Signed-off-by: I Hsin Cheng <richard120310@...il.com>
---
Changelog:
v1 -> v2:
- Refer the patch explicitly as a revert of commit c32ee3d
- Squash commits into 1 commit
- Perform compilation for numerous compilers, architectures and
compare code size shrink for each of them
- Perform cpu-heavy workload test inside x86_64 VM and ARM64 VM
- Analyze the result of disassembly of several small use cases
v2 -> v3:
- Refactor code to single line
- Disabled CONFIG_WERROR to ensure no additional compilation errors or warnings
- Remove description for unrelated compilation errors and warnings
- Summarize the result into a better looking table
- Rephrase the patch and fix typos
Hi Yury,
Sorry about the last iteration that I included everything, making
the email too large and difficult to read. However, you still reviewed
it and gave me feedbacks, really big thanks for your patience and those
feedbacks. Running these tests also gave me a great opportunity to learn
a lot.
If there's anything else needed to be test or modified, please let
me know, I'll ammend them as soon as possible.
Hi David,
Thanks for your advise on alternative ideas for this code. I ran some
simple test (basically check the result of code size reduction) based
on your suggestions.
For gcc-13 on x86_64 + defconfig.
* "(_UL(2) << (h)) - (_UL(1) << (l))"
Before=22438085, After=22438193, Difference ( After - Before ) = 108
* "((_UL(1) + _UL(1)) << (h)) - (_UL(1) << (l))"
Before=22438085, After=22438209, Difference ( After - Before ) = 124
I tried to do an analysis on "intel_arch_events_quirk()", it only +2 in
code size change, I think it would be a nice example to see the differences
in generated code.
So the result shows that your proposal can save 1 negation and 1 "and".
-ffffffff83278ad2: 48 f7 d8 neg %rax
-ffffffff83278adc: 83 e0 7f and $0x7f,%eax
However, one more "mov" and one more "sub" are generated.
+ffffffff83278acf: b8 80 00 00 00 mov $0x80,%eax
+ffffffff83278ad7: 48 29 d0 sub %rdx,%rax
No change in total number of instructions, but negation only requires
one register, and the "mov" generated is more expensive then usual.
(Usually "mov" of the following form are generated,
"48 89 ea mov %rbp,%rdx",
"89 c3 mov %eax,%ebx" ).
Thanks for your advise again, in some scenario it does have positive
effect, but unfortunately, the overall impact is not beneficial.
Best regards,
I Hsin Cheng.
Tests performed on ubuntu 24.04 with AMD Ryzen 7 5700X3D 8-Core
Processor on x86_64 with kernel version v6.14.0-rc1.
---
include/uapi/linux/bits.h | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/include/uapi/linux/bits.h b/include/uapi/linux/bits.h
index 5ee30f882736..682b406e1067 100644
--- a/include/uapi/linux/bits.h
+++ b/include/uapi/linux/bits.h
@@ -4,13 +4,9 @@
#ifndef _UAPI_LINUX_BITS_H
#define _UAPI_LINUX_BITS_H
-#define __GENMASK(h, l) \
- (((~_UL(0)) - (_UL(1) << (l)) + 1) & \
- (~_UL(0) >> (__BITS_PER_LONG - 1 - (h))))
+#define __GENMASK(h, l) (((~_UL(0)) << (l)) & (~_UL(0) >> (BITS_PER_LONG - 1 - (h))))
-#define __GENMASK_ULL(h, l) \
- (((~_ULL(0)) - (_ULL(1) << (l)) + 1) & \
- (~_ULL(0) >> (__BITS_PER_LONG_LONG - 1 - (h))))
+#define __GENMASK_ULL(h, l) (((~_ULL(0)) << (l)) & (~_ULL(0) >> (BITS_PER_LONG_LONG - 1 - (h))))
#define __GENMASK_U128(h, l) \
((_BIT128((h)) << 1) - (_BIT128(l)))
--
2.43.0
Powered by blists - more mailing lists