[<prev] [next>] [day] [month] [year] [list]
Message-ID: <ea3bde05-2b49-e985-5cb2-ecdda87fb3a5@ozlabs.ru>
Date: Tue, 21 Apr 2020 14:23:38 +1000
From: Alexey Kardashevskiy <aik@...abs.ru>
To: "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Cc: Leon Romanovsky <leon@...nel.org>,
Saeed Mahameed <saeedm@...lanox.com>
Subject: mlx5_core irisc not responding
Hi!
I got a Mellanox CX4 card constantly complaining about "irisc not
responding" (below). Is there a way to get a better idea what it is
unhappy about? It is plugged to an experimental POWER9 box which might
have PCI problems. The kernel is v5.6.0.
I thought I try updating the firmware first but mlxup refuses to update
the firmware as it is an OEM adapter (below); and there is no way to
find out which Mellanox PSID corresponds to what I got, any hints? Thanks,
The device is:
root@...ssss2:~# mstflint -d 0001:19:00.0 q
Image type: FS3
FW Version: 14.26.0226
FW Release Date: 4.8.2019
Product Version: 6.0226
Description: UID GuidsNumber
Base GUID: 0894ef030080a89f 8
Base MAC: 00000894ef80a89f 8
Image VSD: N/A
Device VSD: N/A
PSID: IBM0000000034
Security Attributes: N/A
root@...swift2:~# ./mlxup
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX4LX
Part Number: IBM_CX4LX_2p_10GE_x4_Ax
Description: ConnectX-4 LX 10 and 1 G-BaseT dual-port BP; PCIe3.0 x4;
PSID: IBM0000000034
PCI Device Name: 0001:19:00.0
Base MAC: 0894ef80a89f
Versions: Current Available
FW 14.26.0226 N/A Status: No
matching image found
dmesg (the same for :0001:19:00.1):
[ 13.283418] mlx5_core 0001:19:00.0: print_health_info:374:(pid 0):
assert_var[0] 0x00000001
[ 13.283447] mlx5_core 0001:19:00.0: print_health_info:374:(pid 0):
assert_var[1] 0x0087f14c
[ 13.283481] mlx5_core 0001:19:00.0: print_health_info:374:(pid 0):
assert_var[2] 0x00000000
[ 13.283535] mlx5_core 0001:19:00.0: print_health_info:374:(pid 0):
assert_var[3] 0x01020000
[ 13.283588] mlx5_core 0001:19:00.0: print_health_info:374:(pid 0):
assert_var[4] 0x00000000
[ 13.283631] mlx5_core 0001:19:00.0: print_health_info:377:(pid 0):
assert_exit_ptr 0x0080e428
[ 13.283667] mlx5_core 0001:19:00.0: print_health_info:379:(pid 0):
assert_callra 0x0080e070
[ 13.283726] mlx5_core 0001:19:00.0: print_health_info:381:(pid 0):
fw_ver 14.26.226
[ 13.283786] mlx5_core 0001:19:00.0: print_health_info:382:(pid 0):
hw_id 0x0000020b
[ 13.283840] mlx5_core 0001:19:00.0: print_health_info:383:(pid 0):
irisc_index 2
[ 13.283908] mlx5_core 0001:19:00.0: print_health_info:385:(pid 0):
synd 0x7: irisc not responding
[ 13.283949] mlx5_core 0001:19:00.0: print_health_info:386:(pid 0):
ext_synd 0x00c0
[ 13.284014] mlx5_core 0001:19:00.0: print_health_info:388:(pid 0):
raw fw_ver 0xe01a00e2
--
Alexey
Powered by blists - more mailing lists