linux-kernel - Re: [PATCH net-next V2] net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d259ffa9-6c9e-488f-a64f-81025deba75c@nvidia.com>
Date: Tue, 16 Sep 2025 11:39:06 +0300
From: Patrisious Haddad <phaddad@...dia.com>
To: Nathan Chancellor <nathan@...nel.org>, Jason Gunthorpe <jgg@...dia.com>
Cc: Tariq Toukan <tariqt@...dia.com>,
 Catalin Marinas <catalin.marinas@....com>, Eric Dumazet
 <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, Andrew Lunn <andrew+netdev@...n.ch>,
 "David S. Miller" <davem@...emloft.net>, Saeed Mahameed <saeedm@...dia.com>,
 Leon Romanovsky <leon@...nel.org>, Mark Bloch <mbloch@...dia.com>,
 Sabrina Dubroca <sd@...asysnail.net>, netdev@...r.kernel.org,
 linux-rdma@...r.kernel.org, linux-kernel@...r.kernel.org,
 Gal Pressman <gal@...dia.com>, Leon Romanovsky <leonro@...dia.com>,
 Michael Guralnik <michaelgur@...dia.com>, Moshe Shemesh <moshe@...dia.com>,
 Will Deacon <will@...nel.org>, Alexander Gordeev <agordeev@...ux.ibm.com>,
 Andrew Morton <akpm@...ux-foundation.org>,
 Christian Borntraeger <borntraeger@...ux.ibm.com>,
 Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
 Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
 Vasily Gorbik <gor@...ux.ibm.com>, Heiko Carstens <hca@...ux.ibm.com>,
 "H. Peter Anvin" <hpa@...or.com>, Justin Stitt <justinstitt@...gle.com>,
 linux-s390@...r.kernel.org, llvm@...ts.linux.dev,
 Ingo Molnar <mingo@...hat.com>, Bill Wendling <morbo@...gle.com>,
 Nick Desaulniers <ndesaulniers@...gle.com>,
 Salil Mehta <salil.mehta@...wei.com>, Sven Schnelle <svens@...ux.ibm.com>,
 Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org,
 Yisen Zhuang <yisen.zhuang@...wei.com>, Arnd Bergmann <arnd@...db.de>,
 Leon Romanovsky <leonro@...lanox.com>, linux-arch@...r.kernel.org,
 linux-arm-kernel@...ts.infradead.org, Mark Rutland <mark.rutland@....com>,
 Michael Guralnik <michaelgur@...lanox.com>, patches@...ts.linux.dev,
 Niklas Schnelle <schnelle@...ux.ibm.com>, Jijie Shao <shaojijie@...wei.com>
Subject: Re: [PATCH net-next V2] net/mlx5: Improve write-combining test
 reliability for ARM64 Grace CPUs


On 9/16/2025 2:15 AM, Nathan Chancellor wrote:
> External email: Use caution opening links or attachments
>
>
> On Mon, Sep 15, 2025 at 07:48:10PM -0300, Jason Gunthorpe wrote:
>> On Mon, Sep 15, 2025 at 03:27:58PM -0700, Nathan Chancellor wrote:
>>> On Mon, Sep 15, 2025 at 03:18:59PM -0700, Nathan Chancellor wrote:
>>>> On Mon, Sep 15, 2025 at 11:35:08AM +0300, Tariq Toukan wrote:
>>>> ...
>>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>>>> index d77696f46eb5..06d0eb190816 100644
>>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>>>> @@ -176,3 +176,9 @@ mlx5_core-$(CONFIG_PCIE_TPH) += lib/st.o
>>>>>
>>>>>   obj-$(CONFIG_MLX5_DPLL) += mlx5_dpll.o
>>>>>   mlx5_dpll-y := dpll.o
>>>>> +
>>>>> +#
>>>>> +# NEON WC specific for mlx5
>>>>> +#
>>>>> +mlx5_core-$(CONFIG_KERNEL_MODE_NEON) += lib/wc_neon_iowrite64_copy.o
>>>>> +FLAGS_lib/wc_neon_iowrite64_copy.o += $(CC_FLAGS_FPU)
>>>> Does this work as is? I think this needs to be CFLAGS instead of FLAGS
>>>> but I did not test to verify.
>>> Also, Documentation/core-api/floating-point.rst states that code should
>>> also use CFLAGS_REMOVE_ for CC_FLAGS_NO_FPU as well as adding
>>> CC_FLAGS_FPU.
>>>
>>>    CFLAGS_REMOVE_lib/wc_neon_iowrite64_copy.o += $(CC_FLAGS_NO_FPU)
>> I wondered if you needed the seperate compilation unit at all since it
>> it all done with inline assembly.. Since the makefile seems to have a
>> typo, it suggests you don't need the compilation unit and it could
>> just be a little inline protected by CONFIG_KERNEL_MODE_NEON.

There is difference between what actually compiles and the effect of 
these flags on actual performance/assembly translation. To avoid finding 
that the hard way I prefer to stick to their documentation which does as 
Natan described below,

a separate compilation unit between begin and end and the correct flags 
- and eventually that was what I tested , I missed to re-test this post 
finishing my code review - thinking my changes were only cosmetic ...

> Hmmm, clang rejects the current patch
>
>    drivers/net/ethernet/mellanox/mlx5/core/lib/wc_neon_iowrite64_copy.c:9:3: error: instruction requires: neon
>        9 |         ("ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%0]\n\t"
>          |          ^
>    <inline asm>:1:2: note: instantiated into assembly here
>        1 |         ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x19]
>          |         ^
>    drivers/net/ethernet/mellanox/mlx5/core/lib/wc_neon_iowrite64_copy.c:9:48: error: instruction requires: neon
>        9 |         ("ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%0]\n\t"
>          |                                                       ^
>    <inline asm>:2:2: note: instantiated into assembly here
>        2 |         st1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x20]
>          |         ^
>
> while GCC accepts it... It looks like GCC's -mgeneral-regs-only only
> impacts the compiler using floating-point and SIMD registers after [1]
> in GCC 6.x, whereas clang's restriction is on both the compiler and
> assembler. Perhaps clang should be adjusted to match but its behavior
> seems more desirable for the kernel to ensure floating-point code is
> properly separated and called between kernel_fpu_{begin,end}(). This
> error is resolved with the following diff.
>
> [1]: https://gcc.gnu.org/cgit/gcc/commit/?id=7d9425d46b58e69667300331aa55ebddddcceaeb
>
> Cheers,
> Nathan
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> index 06d0eb190816..a85fc21419d8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> @@ -181,4 +181,5 @@ mlx5_dpll-y :=      dpll.o
>   # NEON WC specific for mlx5
>   #
>   mlx5_core-$(CONFIG_KERNEL_MODE_NEON) += lib/wc_neon_iowrite64_copy.o
> -FLAGS_lib/wc_neon_iowrite64_copy.o += $(CC_FLAGS_FPU)
> +CFLAGS_lib/wc_neon_iowrite64_copy.o += $(CC_FLAGS_FPU)
> +CFLAGS_REMOVE_lib/wc_neon_iowrite64_copy.o += $(CC_FLAGS_NO_FPU)

You are spot on, I checked my patchset and the actual tested code 
(performance wise) beyond compilation used the following code:

ifeq ($(ARCH),arm64)
         CFLAGS_lib/neon_iowrite64_copy.o += -ffreestanding
         CFLAGS_REMOVE_lib/neon_iowrite64_copy.o += -mgeneral-regs-only
endif

Which is actually equivalent to the diff you sent, Thanks for the 
heads-up will fix and resend.

Thanks, Patrisious.