[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d259ffa9-6c9e-488f-a64f-81025deba75c@nvidia.com>
Date: Tue, 16 Sep 2025 11:39:06 +0300
From: Patrisious Haddad <phaddad@...dia.com>
To: Nathan Chancellor <nathan@...nel.org>, Jason Gunthorpe <jgg@...dia.com>
Cc: Tariq Toukan <tariqt@...dia.com>,
Catalin Marinas <catalin.marinas@....com>, Eric Dumazet
<edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>, Saeed Mahameed <saeedm@...dia.com>,
Leon Romanovsky <leon@...nel.org>, Mark Bloch <mbloch@...dia.com>,
Sabrina Dubroca <sd@...asysnail.net>, netdev@...r.kernel.org,
linux-rdma@...r.kernel.org, linux-kernel@...r.kernel.org,
Gal Pressman <gal@...dia.com>, Leon Romanovsky <leonro@...dia.com>,
Michael Guralnik <michaelgur@...dia.com>, Moshe Shemesh <moshe@...dia.com>,
Will Deacon <will@...nel.org>, Alexander Gordeev <agordeev@...ux.ibm.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Christian Borntraeger <borntraeger@...ux.ibm.com>,
Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
Vasily Gorbik <gor@...ux.ibm.com>, Heiko Carstens <hca@...ux.ibm.com>,
"H. Peter Anvin" <hpa@...or.com>, Justin Stitt <justinstitt@...gle.com>,
linux-s390@...r.kernel.org, llvm@...ts.linux.dev,
Ingo Molnar <mingo@...hat.com>, Bill Wendling <morbo@...gle.com>,
Nick Desaulniers <ndesaulniers@...gle.com>,
Salil Mehta <salil.mehta@...wei.com>, Sven Schnelle <svens@...ux.ibm.com>,
Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org,
Yisen Zhuang <yisen.zhuang@...wei.com>, Arnd Bergmann <arnd@...db.de>,
Leon Romanovsky <leonro@...lanox.com>, linux-arch@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, Mark Rutland <mark.rutland@....com>,
Michael Guralnik <michaelgur@...lanox.com>, patches@...ts.linux.dev,
Niklas Schnelle <schnelle@...ux.ibm.com>, Jijie Shao <shaojijie@...wei.com>
Subject: Re: [PATCH net-next V2] net/mlx5: Improve write-combining test
reliability for ARM64 Grace CPUs
On 9/16/2025 2:15 AM, Nathan Chancellor wrote:
> External email: Use caution opening links or attachments
>
>
> On Mon, Sep 15, 2025 at 07:48:10PM -0300, Jason Gunthorpe wrote:
>> On Mon, Sep 15, 2025 at 03:27:58PM -0700, Nathan Chancellor wrote:
>>> On Mon, Sep 15, 2025 at 03:18:59PM -0700, Nathan Chancellor wrote:
>>>> On Mon, Sep 15, 2025 at 11:35:08AM +0300, Tariq Toukan wrote:
>>>> ...
>>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>>>> index d77696f46eb5..06d0eb190816 100644
>>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
>>>>> @@ -176,3 +176,9 @@ mlx5_core-$(CONFIG_PCIE_TPH) += lib/st.o
>>>>>
>>>>> obj-$(CONFIG_MLX5_DPLL) += mlx5_dpll.o
>>>>> mlx5_dpll-y := dpll.o
>>>>> +
>>>>> +#
>>>>> +# NEON WC specific for mlx5
>>>>> +#
>>>>> +mlx5_core-$(CONFIG_KERNEL_MODE_NEON) += lib/wc_neon_iowrite64_copy.o
>>>>> +FLAGS_lib/wc_neon_iowrite64_copy.o += $(CC_FLAGS_FPU)
>>>> Does this work as is? I think this needs to be CFLAGS instead of FLAGS
>>>> but I did not test to verify.
>>> Also, Documentation/core-api/floating-point.rst states that code should
>>> also use CFLAGS_REMOVE_ for CC_FLAGS_NO_FPU as well as adding
>>> CC_FLAGS_FPU.
>>>
>>> CFLAGS_REMOVE_lib/wc_neon_iowrite64_copy.o += $(CC_FLAGS_NO_FPU)
>> I wondered if you needed the seperate compilation unit at all since it
>> it all done with inline assembly.. Since the makefile seems to have a
>> typo, it suggests you don't need the compilation unit and it could
>> just be a little inline protected by CONFIG_KERNEL_MODE_NEON.
There is difference between what actually compiles and the effect of
these flags on actual performance/assembly translation. To avoid finding
that the hard way I prefer to stick to their documentation which does as
Natan described below,
a separate compilation unit between begin and end and the correct flags
- and eventually that was what I tested , I missed to re-test this post
finishing my code review - thinking my changes were only cosmetic ...
> Hmmm, clang rejects the current patch
>
> drivers/net/ethernet/mellanox/mlx5/core/lib/wc_neon_iowrite64_copy.c:9:3: error: instruction requires: neon
> 9 | ("ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%0]\n\t"
> | ^
> <inline asm>:1:2: note: instantiated into assembly here
> 1 | ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x19]
> | ^
> drivers/net/ethernet/mellanox/mlx5/core/lib/wc_neon_iowrite64_copy.c:9:48: error: instruction requires: neon
> 9 | ("ld1 {v0.16b, v1.16b, v2.16b, v3.16b}, [%0]\n\t"
> | ^
> <inline asm>:2:2: note: instantiated into assembly here
> 2 | st1 {v0.16b, v1.16b, v2.16b, v3.16b}, [x20]
> | ^
>
> while GCC accepts it... It looks like GCC's -mgeneral-regs-only only
> impacts the compiler using floating-point and SIMD registers after [1]
> in GCC 6.x, whereas clang's restriction is on both the compiler and
> assembler. Perhaps clang should be adjusted to match but its behavior
> seems more desirable for the kernel to ensure floating-point code is
> properly separated and called between kernel_fpu_{begin,end}(). This
> error is resolved with the following diff.
>
> [1]: https://gcc.gnu.org/cgit/gcc/commit/?id=7d9425d46b58e69667300331aa55ebddddcceaeb
>
> Cheers,
> Nathan
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> index 06d0eb190816..a85fc21419d8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
> @@ -181,4 +181,5 @@ mlx5_dpll-y := dpll.o
> # NEON WC specific for mlx5
> #
> mlx5_core-$(CONFIG_KERNEL_MODE_NEON) += lib/wc_neon_iowrite64_copy.o
> -FLAGS_lib/wc_neon_iowrite64_copy.o += $(CC_FLAGS_FPU)
> +CFLAGS_lib/wc_neon_iowrite64_copy.o += $(CC_FLAGS_FPU)
> +CFLAGS_REMOVE_lib/wc_neon_iowrite64_copy.o += $(CC_FLAGS_NO_FPU)
You are spot on, I checked my patchset and the actual tested code
(performance wise) beyond compilation used the following code:
ifeq ($(ARCH),arm64)
CFLAGS_lib/neon_iowrite64_copy.o += -ffreestanding
CFLAGS_REMOVE_lib/neon_iowrite64_copy.o += -mgeneral-regs-only
endif
Which is actually equivalent to the diff you sent, Thanks for the
heads-up will fix and resend.
Thanks, Patrisious.
Powered by blists - more mailing lists