[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z765-uSYQb986RTZ@vaxr-BM6660-BM6360>
Date: Wed, 26 Feb 2025 14:51:38 +0800
From: I Hsin Cheng <richard120310@...il.com>
To: yury.norov@...il.com
Cc: anshuman.khandual@....com, arnd@...db.de, linux-kernel@...r.kernel.org,
jserv@...s.ncku.edu.tw, skhan@...uxfoundation.org, mka@...omium.org,
akpm@...ux-foundation.org
Subject: Re: [PATCH v3] uapi: Revert "bitops: avoid integer overflow in
GENMASK(_ULL)"
On Wed, Feb 26, 2025 at 12:54:37AM +0800, I Hsin Cheng wrote:
> This patch reverts 'commit c32ee3d9abd2("bitops: avoid integer overflow in
> GENMASK(_ULL)")'.
>
> The code generation can be shrink by over 1KB by reverting this commit.
> Originally the commit claimed that clang would emit warnings using the
> implementation at that time.
>
> The patch was applied and tested against numerous compilers, inclduing
> gcc-13, gcc-12, gcc-11 cross-compiler, clang-17, clang-18 and clang-19.
> Various warning levels were set (-W=0, -W=1, -W=2) and CONFIG_WERROR
> disabled to complete the compilation. The results show that no compilation
> errors or warnings were generated due to the patch.
>
> The results of code size reduction are summarized in the following table.
> The code size changes for clang are all zero across different versions,
> so they're not listed in the table.
>
> For NR_CPUS=64 on x86_64.
> ----------------------------------------------
> | | gcc-13 | gcc-12 | gcc-11 |
> ----------------------------------------------
> | old | 22438085 | 22453915 | 22302033 |
> ----------------------------------------------
> | new | 22436816 | 22452913 | 22300826 |
> ----------------------------------------------
> | new - old | -1269 | -1002 | -1207 |
> ----------------------------------------------
>
> For NR_CPUS=1024 on x86_64.
> ----------------------------------------------
> | | gcc-13 | gcc-12 | gcc-11 |
> ----------------------------------------------
> | old | 22493682 | 22509812 | 22357661 |
> ----------------------------------------------
> | new | 22493230 | 22509487 | 22357250 |
> ----------------------------------------------
> | new - old | -452 | -325 | -411 |
> ----------------------------------------------
>
> For arm64 architecture, gcc cross-compiler was used and QEMU was
> utilized to execute a VM for a CPU-heavy workload to ensure no
> side effects and that functionalities remained correct. The test
> even demonstrated a positive result in terms of code size reduction:
> * Before: 31660668
> * After: 31658724
> * Difference (After - Before): -1944
>
> An analysis of multiple functions compiled with gcc-13 on x86_64 was
> performed. In summary, the patch elimates one negation in almost
> every use case. However, negative effects may occur in some cases,
> such as the generation of additional "mov" instruction or increased
> register usage> The use of "~_UL(0) << (l)" may even result in the
> allocations of "%r*" registers instead of "%e*" registers (which are
> 32-bit registers) because the compiler cannot assume that the higher
> bits are zero.
>
> Signed-off-by: I Hsin Cheng <richard120310@...il.com>
> ---
> Changelog:
>
> v1 -> v2:
> - Refer the patch explicitly as a revert of commit c32ee3d
> - Squash commits into 1 commit
> - Perform compilation for numerous compilers, architectures and
> compare code size shrink for each of them
> - Perform cpu-heavy workload test inside x86_64 VM and ARM64 VM
> - Analyze the result of disassembly of several small use cases
>
> v2 -> v3:
> - Refactor code to single line
> - Disabled CONFIG_WERROR to ensure no additional compilation errors or warnings
> - Remove description for unrelated compilation errors and warnings
> - Summarize the result into a better looking table
> - Rephrase the patch and fix typos
>
>
> Hi Yury,
>
> Sorry about the last iteration that I included everything, making
> the email too large and difficult to read. However, you still reviewed
> it and gave me feedbacks, really big thanks for your patience and those
> feedbacks. Running these tests also gave me a great opportunity to learn
> a lot.
>
> If there's anything else needed to be test or modified, please let
> me know, I'll ammend them as soon as possible.
>
> Hi David,
>
> Thanks for your advise on alternative ideas for this code. I ran some
> simple test (basically check the result of code size reduction) based
> on your suggestions.
>
> For gcc-13 on x86_64 + defconfig.
> * "(_UL(2) << (h)) - (_UL(1) << (l))"
> Before=22438085, After=22438193, Difference ( After - Before ) = 108
> * "((_UL(1) + _UL(1)) << (h)) - (_UL(1) << (l))"
> Before=22438085, After=22438209, Difference ( After - Before ) = 124
>
> I tried to do an analysis on "intel_arch_events_quirk()", it only +2 in
> code size change, I think it would be a nice example to see the differences
> in generated code.
>
> So the result shows that your proposal can save 1 negation and 1 "and".
> -ffffffff83278ad2: 48 f7 d8 neg %rax
> -ffffffff83278adc: 83 e0 7f and $0x7f,%eax
>
> However, one more "mov" and one more "sub" are generated.
> +ffffffff83278acf: b8 80 00 00 00 mov $0x80,%eax
> +ffffffff83278ad7: 48 29 d0 sub %rdx,%rax
>
> No change in total number of instructions, but negation only requires
> one register, and the "mov" generated is more expensive then usual.
> (Usually "mov" of the following form are generated,
> "48 89 ea mov %rbp,%rdx",
> "89 c3 mov %eax,%ebx" ).
>
> Thanks for your advise again, in some scenario it does have positive
> effect, but unfortunately, the overall impact is not beneficial.
>
> Best regards,
> I Hsin Cheng.
>
> Tests performed on ubuntu 24.04 with AMD Ryzen 7 5700X3D 8-Core
> Processor on x86_64 with kernel version v6.14.0-rc1.
> ---
> include/uapi/linux/bits.h | 8 ++------
> 1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/include/uapi/linux/bits.h b/include/uapi/linux/bits.h
> index 5ee30f882736..682b406e1067 100644
> --- a/include/uapi/linux/bits.h
> +++ b/include/uapi/linux/bits.h
> @@ -4,13 +4,9 @@
> #ifndef _UAPI_LINUX_BITS_H
> #define _UAPI_LINUX_BITS_H
>
> -#define __GENMASK(h, l) \
> - (((~_UL(0)) - (_UL(1) << (l)) + 1) & \
> - (~_UL(0) >> (__BITS_PER_LONG - 1 - (h))))
> +#define __GENMASK(h, l) (((~_UL(0)) << (l)) & (~_UL(0) >> (BITS_PER_LONG - 1 - (h))))
>
> -#define __GENMASK_ULL(h, l) \
> - (((~_ULL(0)) - (_ULL(1) << (l)) + 1) & \
> - (~_ULL(0) >> (__BITS_PER_LONG_LONG - 1 - (h))))
> +#define __GENMASK_ULL(h, l) (((~_ULL(0)) << (l)) & (~_ULL(0) >> (BITS_PER_LONG_LONG - 1 - (h))))
>
> #define __GENMASK_U128(h, l) \
> ((_BIT128((h)) << 1) - (_BIT128(l)))
> --
> 2.43.0
>
Please ignore this one, I missed to cc to David.
I'll resend again later.
Beset regards,
I Hsin Cheng
Powered by blists - more mailing lists