[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <733c15e7-2950-4dc7-93c0-11c4eff7ce0b@nvidia.com>
Date: Wed, 8 Jan 2025 11:26:39 +0200
From: Carolina Jubran <cjubran@...dia.com>
To: Samuel Dobron <sdobron@...hat.com>, Dragos Tatulea <dtatulea@...dia.com>,
Tariq Toukan <tariqt@...dia.com>, "daniel@...earbox.net"
<daniel@...earbox.net>, "hawk@...nel.org" <hawk@...nel.org>,
"mianosebastiano@...il.com" <mianosebastiano@...il.com>
Cc: "toke@...hat.com" <toke@...hat.com>, "pabeni@...hat.com"
<pabeni@...hat.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"edumazet@...gle.com" <edumazet@...gle.com>,
Saeed Mahameed <saeedm@...dia.com>, "bpf@...r.kernel.org"
<bpf@...r.kernel.org>, "kuba@...nel.org" <kuba@...nel.org>,
Benjamin Poirier <bpoirier@...hat.com>
Subject: Re: XDP Performance Regression in recent kernel versions
Hello,
Thank you Sam for the detailed information.
I have identified the specific kernel configuration change responsible
for the degradation between kernel versions
6.4.0-0.rc6.20230614gitb6dad5178cea.49.eln126 and
6.4.0-0.rc6.20230616git40f71e7cd3c6.50.eln126. The introduction of the
CONFIG_INIT_STACK_ALL_ZERO setting in the latter version has led to a
noticeable performance impact.
I am currently investigating why this change specifically affects mlx5.
Thanks,
Carolina
On 11/12/2024 15:20, Samuel Dobron wrote:
> Hey all,
>
> We recently enabled tests for XDP TX, so I was able to test
> xdp tx as well.
>
> XDP_DROP performance regression is the same as I reported
> a while ago. There is about 20% regression in
> kernel-6.4.0-0.rc6.20230616git40f71e7cd3c6.50.eln126 (baseline)
> compared to previous kernel
> kernel-6.4.0-0.rc6.20230614gitb6dad5178cea.49.eln126 (broken).
> We don't see such regression for other drivers.
>
> The regression was partially fixed somewhere between eln126 and
> kernel-6.10.0-0.rc2.20240606git2df0193e62cf.27.eln137 (partially
> fixed) and the performance since then is -7 to -15% compared to
> baseline. So, nothing new.
>
> XDP_TX is however, more interesting.
> When comparing baseline with broken kernel there is 20 - 25%
> performance drop (cpu utilizations remains the same) on mlx driver.
> There is also 10% drop on other drivers as well. HOWEVER, it got
> fixed somewhere between broken and partially fixed kernel. On most
> recent kernels, we don't see that regressions on other drivers. But
> 2-10% (depends if using dpa/load-bytes) regression remains on mlx5.
>
> The numbers look a bit similar to regression with enabled spectre/meltdown
> mitigations but based on my experiments, there is no difference with
> enabled/disabled mitigations.
>
> Hope this will help,
> Sam.
>
> On Tue, Jul 30, 2024 at 1:04 PM Samuel Dobron <sdobron@...hat.com> wrote:
>>
>>> Could you try adding the mentioned parameters to your kernel arguments
>>> and check if you still see the degradation?
>>
>> Hey,
>> So i tried multiple kernels around v5.15 as well as couple of previous
>> v6.xx and there is no difference with spectre v2 mitigations enabled
>> or disabled.
>>
>> No difference on other drivers as well.
>>
>>
>> Sam.
>>
>
Powered by blists - more mailing lists