[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Ztna8O1ZGUc4kvKJ@shredder.mtl.com>
Date: Thu, 5 Sep 2024 19:23:12 +0300
From: Ido Schimmel <idosch@...dia.com>
To: Krzysztof Olędzki <ole@....pl>
Cc: gal@...dia.com, Tariq Toukan <tariqt@...dia.com>,
Yishai Hadas <yishaih@...dia.com>,
Michal Kubecek <mkubecek@...e.cz>, Jakub Kicinski <kuba@...nel.org>,
Andrew Lunn <andrew@...n.ch>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [mlx4] Mellanox ConnectX2 (MHQH29C aka 26428) and module
diagnostic support (ethtool -m) issues
On Wed, Sep 04, 2024 at 09:47:04PM -0700, Krzysztof Olędzki wrote:
> This BTW looks like another problem:
>
> # ethtool -m eth1 hex on offset 254 length 1
> Offset Values
> ------ ------
> 0x00fe: 00
>
> # ethtool -m eth1 hex on offset 255 length 1
> Cannot get Module EEPROM data: Unknown error 1564
>
> mlx4_core 0000:01:00.0: MLX4_CMD_MAD_IFC Get Module info attr(ff60) port(1) i2c_addr(50) offset(255) size(1): Response Mad Status(61c) - invalid device_address or size (that is, size equals 0 or address+size is greater than 256)
> mlx4_en: eth1: mlx4_get_module_info i(0) offset(255) bytes_to_read(1) - FAILED (0xfffff9e4)
>
> With the netlink interface, ethtool seems to be only asking for for the first 128 bytes, which works:
Yes. The upper 128 bytes are reserved so sff8079_show_all_nl() doesn't
bother querying them. Explains why you don't see this error with
netlink.
Regarding the runtime "--disable-netlink" patch, I personally don't mind
and Andrew seems in favor, so please post a proper patch and lets see
what Michal says.
Regarding the patch that unmasks the I2C address error, I would target
it at net-next as it doesn't really fix a bug (ethtool already displays
what it can). Thinking about it, I believe it would be more worthwhile
to implement the much simpler get_module_eeprom_by_page() ethtool
operation in mlx4 (I can help with the review). It would've helped
avoiding the current issue (kernel will return an error) and the
previous bug [1] you encountered with the legacy operations.
Regarding the fact that these modules work properly with CX3, but not
with CX2 (which uses the same driver), it really seems like a HW/FW
problem and unfortunately I can't help with that.
[1] https://lore.kernel.org/all/b17c5336-6dc3-41f2-afa6-f9e79231f224@ans.pl/
Powered by blists - more mailing lists