[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHC9VhT+=DtJ1K1CJDY4=L_RRJSGqRDvnaOdA6j9n+bF7y+36A@mail.gmail.com>
Date: Sun, 9 Apr 2023 19:50:34 -0400
From: Paul Moore <paul@...l-moore.com>
To: Linux regressions mailing list <regressions@...ts.linux.dev>
Cc: Saeed Mahameed <saeed@...nel.org>, Shay Drory <shayd@...dia.com>,
Saeed Mahameed <saeedm@...dia.com>, netdev@...r.kernel.org,
selinux@...r.kernel.org
Subject: Re: Potential regression/bug in net/mlx5 driver
On Sun, Apr 9, 2023 at 4:48 AM Linux regression tracking (Thorsten
Leemhuis) <regressions@...mhuis.info> wrote:
> On 30.03.23 03:27, Paul Moore wrote:
> > On Wed, Mar 29, 2023 at 6:20 PM Saeed Mahameed <saeed@...nel.org> wrote:
> >> On 28 Mar 19:08, Paul Moore wrote:
> >>>
> >>> Starting with the v6.3-rcX kernel releases I noticed that my
> >>> InfiniBand devices were no longer present under /sys/class/infiniband,
> >>> causing some of my automated testing to fail. It took me a while to
> >>> find the time to bisect the issue, but I eventually identified the
> >>> problematic commit:
> >>>
> >>> commit fe998a3c77b9f989a30a2a01fb00d3729a6d53a4
> >>> Author: Shay Drory <shayd@...dia.com>
> >>> Date: Wed Jun 29 11:38:21 2022 +0300
> >>>
> >>> net/mlx5: Enable management PF initialization
> >>>
> >>> Enable initialization of DPU Management PF, which is a new loopback PF
> >>> designed for communication with BMC.
> >>> For now Management PF doesn't support nor require most upper layer
> >>> protocols so avoid them.
> >>>
> >>> Signed-off-by: Shay Drory <shayd@...dia.com>
> >>> Reviewed-by: Eran Ben Elisha <eranbe@...dia.com>
> >>> Reviewed-by: Moshe Shemesh <moshe@...dia.com>
> >>> Signed-off-by: Saeed Mahameed <saeedm@...dia.com>
> >>>
> >>> I'm not a mlx5 driver expert so I can't really offer much in the way
> >>> of a fix, but as a quick test I did remove the
> >>> 'mlx5_core_is_management_pf(...)' calls in mlx5/core/dev.c and
> >>> everything seemed to work okay on my test system (or rather the tests
> >>> ran without problem).
> >>>
> >>> If you need any additional information, or would like me to test a
> >>> patch, please let me know.
> >>
> >> Our team is looking into this, the current theory is that you have an old
> >> FW that doesn't have the correct capabilities set.
> >
> > That's very possible; I installed this card many years ago and haven't
> > updated the FW once.
> >
> > I'm happy to update the FW (do you have a
> > pointer/how-to?), but it might be good to identify a fix first as I'm
> > guessing there will be others like me ...
>
> Nothing happened here for about ten days afaics (or was there progress
> and I just missed it?). That made me wonder: how sound is Paul's guess
> that there will be others that might run into this? If that's likely it
> afaics would be good to get this regression fixed before the release,
> which is just two or three weeks away.
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> #regzbot poke
I haven't seen any updates from the mlx5 driver folks, although I may
not have been CC'd?
I did revert that commit on my automated testing kernels and things
are working correctly again, although I'm pretty sure that's not a
good long term solution. I did also dig up the information on
updating the card's firmware, but I'm holding off on that in case the
driver devs want me to test a fix.
--
paul-moore.com
Powered by blists - more mailing lists