lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241015080617.79e90a06@kernel.org>
Date: Tue, 15 Oct 2024 08:06:17 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: saeedm@...dia.com, tariqt@...dia.com
Cc: Til Kaiser <mail@...54.de>, leonro@...dia.com, netdev@...r.kernel.org,
 linux-rdma@...r.kernel.org
Subject: Re: [BUG] net/mlx5: missing sysfs hwmon entry for ConnectX-4 cards

On Thu, 10 Oct 2024 17:11:21 +0200 Til Kaiser wrote:
> I noticed on our dual-port 100G ConnectX-4 cards (MT27700 Family) 
> running Linux Kernel version 6.6.56 and the latest ConnectX-4 firmware 
> version 12.28.2302 that we do not have a sysfs hwmon entry for reading 
> temperature values.
> When running Kernel version 6.6.32, the hwmon entry is there again, and 
> I can read the temperature values of those cards.
> Strangely, this problem doesn't occur on our ConnectX-4 Lx cards 
> (MT27710 Family), regardless of which Kernel version I use.
> 
> I looked into the mlx5 core driver and noticed that it is checking the 
> MCAM register here: 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/hwmon.c?h=v6.6.56#n380.
> When I removed that check, the hwmon entry reappeared again.
> 
> Looking into recent mlx5 commits regarding this MCAM register, I found 
> this commit: 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.6.56&id=fb035aa9a3f8fd327ab83b15a94929d2b9045995.
> When I reverted this commit, the hwmon entry also reappeared again.
> 
> I also found a firmware bug fix regarding that register inside the 
> ConnectX-4 Lx bug fix history here (Ref. 2339971): 
> https://docs.nvidia.com/networking/display/connectx4lxfirmwarev14321900/bug+fixes+history.
> I couldn't find such a firmware fix for the non-Lx ConnectX-4 cards. So, 
> I'm unsure whether this might be a mlx5 driver or firmware issue.

Hi, any word on this? Sounds like a fairly straightforward problem.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ