lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56fcf799-d6ac-4189-8c6f-ebc0d8f694d8@ans.pl>
Date: Wed, 4 Sep 2024 22:00:07 -0700
From: Krzysztof Olędzki <ole@....pl>
To: Ido Schimmel <idosch@...dia.com>, gal@...dia.com
Cc: Tariq Toukan <tariqt@...dia.com>, Yishai Hadas <yishaih@...dia.com>,
        Michal Kubecek <mkubecek@...e.cz>, Jakub Kicinski <kuba@...nel.org>,
        Andrew Lunn <andrew@...n.ch>,
        "netdev@...r.kernel.org"
 <netdev@...r.kernel.org>
Subject: Re: [mlx4] Mellanox ConnectX2 (MHQH29C aka 26428) and module
 diagnostic support (ethtool -m) issues

On 04.09.2024 at 21:47, Krzysztof Olędzki wrote:
> On 04.09.2024 at 21:08, Krzysztof Olędzki wrote:
>> On 04.09.2024 at 06:00, Ido Schimmel wrote:
>>> I see Tariq is OOO so I'm adding Gal who might be able to help with
>>> CX2/mlx4 issues.
>>>
>>> On Sat, Aug 31, 2024 at 11:28:03PM -0700, Krzysztof Olędzki wrote:
>>>> Hi,
>>>>
>>>> I noticed that module diagnostic on Mellanox ConnectX2 NIC (MHQH29C aka 26428 aka 15b3:673c, FW version 2.10.0720) behaves in somehow strange ways.
>>>>
>>>> 1. For SFP modules the driver is able to read the first page but not the 2nd one:
>>>>
>>>> [  318.082923] mlx4_core 0000:01:00.0: MLX4_CMD_MAD_IFC Get Module info attr(ff60) port(1) i2c_addr(51) offset(0) size(48): Response Mad Status(71c) - invalid I2C slave address
>>>> [  318.082936] mlx4_en: eth1: mlx4_get_module_info i(0) offset(256) bytes_to_read(128) - FAILED (0xfffff8e4)
>>> I assume you are using a relatively recent ethtool with netlink support.
>> Yes, sorry for not stating this explicitly:
>>
>> # ethtool  --version
>> ethtool version 6.10
>>
>>> It should only try to read from I2C address 0x51 if the module indicated
>>> support for diagnostics via bit 6 in byte 92.
>>>
>>> A few things worth checking:
>>>
>>> 1. mlx4 does not implement the modern get_module_eeprom_by_page() ethtool
>>> operation so what it gets invoked is the fallback path in
>>> eeprom_fallback(). Can you try to rule out problems in this path by
>>> compiling ethtool without netlink support (i.e., ./configure
>>> --disable-netlink) and retesting? I don't think it will make a
>>> difference, but worth trying.
>> Right... I should have thought about this.
>>
>> Interestingly, this makes things even worse:
>>
>> # ethtool  -m eth2
>> Cannot get Module EEPROM data: Unknown error 1564
>>
>> mlx4_core 0000:01:00.0: MLX4_CMD_MAD_IFC Get Module info attr(ff60) port(2) i2c_addr(50) offset(240) size(16): Response Mad Status(61c) - invalid device_address or size (that is, size equals 0 or address+size is greater than 256)
>> mlx4_en: eth2: mlx4_get_module_info i(240) offset(240) bytes_to_read(272) - FAILED (0xfffff9e4)
>>
>> 1564 is mlx4_get_module_info() incorrectly returning -0x61c coming from "Response Mad Status(61c)"...

And 240 comes from:

#define MODULE_INFO_MAX_READ 48

mlx4_get_module_info() has:

        if (size > MODULE_INFO_MAX_READ)
                size = MODULE_INFO_MAX_READ;

So, the reads are:
  0.. 47 (48)
 48.. 95 (48)
 96..143 (48)
144..191 (48)
192..239 (48)
240..255 (16) <- this one fails

> 
> This BTW looks like another problem:
> 
> # ethtool -m eth1 hex on offset 254 length 1
> Offset          Values
> ------          ------
> 0x00fe:         00
> 
> # ethtool -m eth1 hex on offset 255 length 1
> Cannot get Module EEPROM data: Unknown error 1564
> 
> mlx4_core 0000:01:00.0: MLX4_CMD_MAD_IFC Get Module info attr(ff60) port(1) i2c_addr(50) offset(255) size(1): Response Mad Status(61c) - invalid device_address or size (that is, size equals 0 or address+size is greater than 256)
> mlx4_en: eth1: mlx4_get_module_info i(0) offset(255) bytes_to_read(1) - FAILED (0xfffff9e4)
> 
> With the netlink interface, ethtool seems to be only asking for for the first 128 bytes, which works:
> 
> sending genetlink packet (76 bytes):
>     msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
>     ETHTOOL_MSG_MODULE_EEPROM_GET
>         ETHTOOL_A_MODULE_EEPROM_HEADER
>             ETHTOOL_A_HEADER_DEV_NAME = "eth1"
>         ETHTOOL_A_MODULE_EEPROM_LENGTH = 128
>         ETHTOOL_A_MODULE_EEPROM_OFFSET = 0
>         ETHTOOL_A_MODULE_EEPROM_PAGE = 0
>         ETHTOOL_A_MODULE_EEPROM_BANK = 0
>         ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS = 81
> 
> For the ioctl one, it looks like we want all 512 bytes but reading fails when trying to read 
> 16 bytes @240 because of the 255 octet issue. As the driver only masks the "invalid I2C slave address"
> error (CABLE_INF_I2C_ADDR) but not "invalid device_address or size" (CABLE_INF_INV_ADDR) it is
> able to successfully read 255 octets from A0h, and also to fake 256 from A2h:
> 
> # ethtool -m eth1 hex on offset 256
> Offset          Values
> ------          ------
> 0x0100:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x0110:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x0120:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x0130:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x0140:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x0150:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x0160:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x0170:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x0180:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x0190:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x01a0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x01b0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x01c0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x01d0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x01e0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x01f0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 
> (again with "i2c_addr(51) offset(0) size(48): Response Mad Status(71c) - invalid I2C slave address")
> 
> To compare, this is what we get from CX3 pro:
> 
> Offset          Values
> ------          ------
> 0x0100:         55 00 fb 00 50 00 00 00 8c a0 75 30 87 28 7a 44
> 0x0110:         14 82 04 e2 14 82 04 e2 4e 20 04 ec 1e dc 0c 62
> 0x0120:         4e 20 01 3b 1e dc 01 3b 00 00 00 00 00 00 00 00
> 0x0130:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x0140:         00 00 00 00 3f 80 00 00 00 00 00 00 01 00 00 00
> 0x0150:         01 00 00 00 01 00 00 00 01 00 00 00 00 00 00 3f
> 0x0160:         24 4e 7f 55 0e f6 16 c1 00 01 00 00 00 00 32 00
> 0x0170:         00 40 00 00 00 40 00 00 00 00 1c 00 00 00 00 00
> 0x0180:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x0190:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x01a0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x01b0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x01c0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x01d0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x01e0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x01f0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 
> Also no issue here:
> 0x00f0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  
>> Also.. I think this is 3rd time I'm recompiling ethtool without netlink support.
>> Would it make sense to add add --disable-netlink?
>>
>> RFC quality patch:
>>
>> Subject: [PATCH ethtool] Add runtime support for disabling netlink
>>
>> Provide --disable-netlink option for disabling netlink during runtime,
>> without the need to recompile the binary.
>>
>> Signed-off-by: Krzysztof Piotr Oledzki <ole@....pl>
>> ---
>>  ethtool.8.in      | 6 ++++++
>>  ethtool.c         | 6 ++++++
>>  internal.h        | 1 +
>>  netlink/netlink.c | 5 +++++
>>  4 files changed, 18 insertions(+)
>>
>> diff --git a/ethtool.8.in b/ethtool.8.in
>> index 11bb0f9..0b54983 100644
>> --- a/ethtool.8.in
>> +++ b/ethtool.8.in
>> @@ -137,6 +137,9 @@ ethtool \- query or control network driver and hardware settings
>>  .BN --debug
>>  .I args
>>  .HP
>> +.B ethtool [--disable-netlink]
>> +.I args
>> +.HP
>>  .B ethtool [--json]
>>  .I args
>>  .HP
>> @@ -579,6 +582,9 @@ lB	l.
>>  0x10  Structure of netlink messages
>>  .TE
>>  .TP
>> +.BI \-\-disable-netlink
>> +Do not use netlink and fall back to the ioctl interface if possible.
>> +.TP
>>  .BI \-\-json
>>  Output results in JavaScript Object Notation (JSON). Only a subset of
>>  options support this. Those which do not will continue to output
>> diff --git a/ethtool.c b/ethtool.c
>> index 7f47407..dc28069 100644
>> --- a/ethtool.c
>> +++ b/ethtool.c
>> @@ -6537,6 +6537,12 @@ int main(int argc, char **argp)
>>  			argc -= 2;
>>  			continue;
>>  		}
>> +		if (*argp && !strcmp(*argp, "--disable-netlink")) {
>> +			ctx.nl_disable = true;
>> +			argp += 1;
>> +			argc -= 1;
>> +			continue;
>> +		}
>>  		if (*argp && !strcmp(*argp, "--json")) {
>>  			ctx.json = true;
>>  			argp += 1;
>> diff --git a/internal.h b/internal.h
>> index 4b994f5..84c64be 100644
>> --- a/internal.h
>> +++ b/internal.h
>> @@ -221,6 +221,7 @@ struct cmd_context {
>>  	char **argp;		/* arguments to the sub-command */
>>  	unsigned long debug;	/* debugging mask */
>>  	bool json;		/* Output JSON, if supported */
>> +	bool nl_disable;	/* Disable netlink even if available */
>>  	bool show_stats;	/* include command-specific stats */
>>  #ifdef ETHTOOL_ENABLE_NETLINK
>>  	struct nl_context *nlctx;	/* netlink context (opaque) */
>> diff --git a/netlink/netlink.c b/netlink/netlink.c
>> index ef0d825..3cf1710 100644
>> --- a/netlink/netlink.c
>> +++ b/netlink/netlink.c
>> @@ -470,6 +470,11 @@ void netlink_run_handler(struct cmd_context *ctx, nl_chk_t nlchk,
>>  	const char *reason;
>>  	int ret;
>>  
>> +	if (ctx->nl_disable) {
>> +		reason = "netlink disabled";
>> +		goto no_support;
>> +	}
>> +
>>  	if (nlchk && !nlchk(ctx)) {
>>  		reason = "ioctl-only request";
>>  		goto no_support;
> 
>>> 2. Can you test this transceiver with a different NIC?
>> Ah, yes. Sorry once again - I had done that already, and of course it works,
>> which is why I came here and explicitly blamed CX2.
>>
>> Here is the output from a CX3 Pro NIC:
>>
>> # ethtool -m eth2
>>         Identifier                                : 0x03 (SFP)
>> (...)
>>         Optical diagnostics support               : Yes
>>         Laser bias current                        : 7.574 mA
>>         Laser output power                        : 0.5815 mW / -2.35 dBm
>>         Receiver signal average optical power     : 0.0001 mW / -40.00 dBm
>>         Module temperature                        : 32.13 degrees C / 89.83 degrees F
>>         Module voltage                            : 3.2714 V
>>         Alarm/warning flags implemented           : Yes
>>         Laser bias current high alarm             : Off
>>         Laser bias current low alarm              : Off
>>         Laser bias current high warning           : Off
>>         Laser bias current low warning            : Off
>>         Laser output power high alarm             : Off
>>         Laser output power low alarm              : Off
>>         Laser output power high warning           : Off
>>         Laser output power low warning            : Off
>>         Module temperature high alarm             : Off
>>         Module temperature low alarm              : Off
>>         Module temperature high warning           : Off
>>         Module temperature low warning            : Off
>>         Module voltage high alarm                 : Off
>>         Module voltage low alarm                  : Off
>>         Module voltage high warning               : Off
>>         Module voltage low warning                : Off
>>         Laser rx power high alarm                 : Off
>>         Laser rx power low alarm                  : On
>>         Laser rx power high warning               : Off
>>         Laser rx power low warning                : On
>>         Laser bias current high alarm threshold   : 10.500 mA
>>         Laser bias current low alarm threshold    : 2.500 mA
>>         Laser bias current high warning threshold : 10.500 mA
>>         Laser bias current low warning threshold  : 2.500 mA
>>         Laser output power high alarm threshold   : 2.0000 mW / 3.01 dBm
>>         Laser output power low alarm threshold    : 0.1260 mW / -9.00 dBm
>>         Laser output power high warning threshold : 0.7900 mW / -1.02 dBm
>>         Laser output power low warning threshold  : 0.3170 mW / -4.99 dBm
>>         Module temperature high alarm threshold   : 85.00 degrees C / 185.00 degrees F
>>         Module temperature low alarm threshold    : -5.00 degrees C / 23.00 degrees F
>>         Module temperature high warning threshold : 80.00 degrees C / 176.00 degrees F
>>         Module temperature low warning threshold  : 0.00 degrees C / 32.00 degrees F
>>         Module voltage high alarm threshold       : 3.6000 V
>>         Module voltage low alarm threshold        : 3.0000 V
>>         Module voltage high warning threshold     : 3.4600 V
>>         Module voltage low warning threshold      : 3.1300 V
>>         Laser rx power high alarm threshold       : 2.0000 mW / 3.01 dBm
>>         Laser rx power low alarm threshold        : 0.0315 mW / -15.02 dBm
>>         Laser rx power high warning threshold     : 0.7900 mW / -1.02 dBm
>>         Laser rx power low warning threshold      : 0.0315 mW / -15.02 dBm
>>
>>
>>> 3. I'm wondering if this transceiver requires an "address change
>>> sequence" before accessing I2C address 0x51 (see SFF-8472 Section 8.9
>>> Addressing Modes). The generic SFP driver doesn't support it (see
>>> sfp_module_parse_sff8472()) and other drivers probably don't support it
>>> as well. Can you look at an hexdump of page 0 and see if this bit is
>>> set? If so, maybe the correct thing to do would be to teach the SFF-8472
>>> parser to look at both bit 2 and bit 6 before trying to access this I2C
>>> address.
>> That would be byte 92, correct?
>>
>> # ethtool -m eth2 raw on offset 92 length 1|hexdump -C
>> 00000000  68                                                |h|
>> 00000001
>>
>> 0x68 = 01101000b:
>>
>> - 6 Digital diagnostic monitoring implemented (described in this document).
>> - 5 Internally calibrated
>> - 3 Received power measurement type: 0 = OMA, 1 = average power
>>
>>>> However, as the driver intentionally tries mask the problem [1], ethtool reports "Optical diagnostics support" being available and shows completely wrong information [2].
>>>>
>>>> Removing the workaround allows ethtool to recognize the problem and handle everything correctly [3]:
>>>> ---- cut here ----
>>>> --- a/drivers/net/ethernet/mellanox/mlx4/port.c	2024-07-27 02:34:11.000000000 -0700
>>>> +++ b/drivers/net/ethernet/mellanox/mlx4/port.c	2024-08-31 21:57:11.211612505 -0700
>>>> @@ -2197,14 +2197,7 @@
>>>>  			  0xFF60, port, i2c_addr, offset, size,
>>>>  			  ret, cable_info_mad_err_str(ret));
>>>>  
>>>> -		if (i2c_addr == I2C_ADDR_HIGH &&
>>>> -		    MAD_STATUS_2_CABLE_ERR(ret) == CABLE_INF_I2C_ADDR)
>>>> -			/* Some SFP cables do not support i2c slave
>>>> -			 * address 0x51 (high page), abort silently.
>>>> -			 */
>>>> -			ret = 0;
>>>> -		else
>>>> -			ret = -ret;
>>>> +		ret = -ret;
>>>>  		goto out;
>>>>  	}
>>>>  	cable_info = (struct mlx4_cable_info *)outm
>>>> ---- cut here ----
>>>>
>>>> However, we end up with a strange "netlink error: Unknown error 1820" error because mlx4_get_module_info returns -0x71c (0x71c is 1820 in decimal).
>>>>
>>>> This can be fixed with returning -EIO instead of ret, either in mlx4_get_module_info() or perhaps better mlx4_en_get_module_eeprom() from en_ethtool.c:
>>>> ---- cut here ----
>>>> --- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c	2024-07-27 02:34:11.000000000 -0700
>>>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c	2024-08-31 21:52:50.370553218 -0700
>>>> @@ -2110,7 +2110,7 @@
>>>>  			en_err(priv,
>>>>  			       "mlx4_get_module_info i(%d) offset(%d) bytes_to_read(%d) - FAILED (0x%x)\n",
>>>>  			       i, offset, ee->len - i, ret);
>>>> -			return ret;
>>>> +			return -EIO;
>>>>  		}
>>>>  
>>>>  		i += ret;
>>>> ---- cut here ----
>>>>
>>>> BTW: it is also possible to augment the error reporting in ethtool/sfpid.c:
>>>> ---- cut here ----
>>>> -       if (ret)
>>>> +       if (ret) {
>>>> +               fprintf(stderr, "Failed to read Page A2h.\n");
>>>>                 goto out;
>>>> +       }
>>>> ---- cut here ----
>>>> With all the above changes, we now get:
>>>>
>>>> ---- cut here ----
>>>>         Identifier                                : 0x03 (SFP)
>>>>         Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
>>>> (...)
>>>>         Date code                                 : <REDACTED>
>>>> netlink error: Input/output error
>>>> Failed to read Page A2h.
>>>> ---- cut here ----
>>>>
>>>> So, the first question is if above set of fixes makes sense, give that ethtool handles this correctly? If so, I'm happy to send the fixes.
>>> I believe it makes sense for the driver to return an error rather than
>>> mask the problem and return the wrong information (zeroes).
>> Alright, will work on the patches. Thank you BTW for your very patient and
>> encouraging review and support the last time. Greatly appreciated.
>>
>>>> The second question is if not being able to read Page A2h and "invalid I2C slave address" is a due to a bug in the driver or a HW (firmware?) limitation and if something can be done to address this?
>>> Let's see if it's related to the "address change sequence" I mentioned
>>> above. Maybe that's why the error masking was put in mlx4 in the first
>>> place.
>> So it seems it is not, but please double check after me.
>>
>>>> 2. For a QSFP module (which works in CX3/CX3Pro), handling "ethtool -m" seems to be completely broken.
>>> Given it works with CX3, then the problem is most likely with CX2 HW/FW.
>>> Gal, can you or someone from the team look into it?
>> On 04.09.2024 at 08:09, Gal Pressman wrote:
>>>> ConnectX-2 is End-of-Life since 2015 and End-of-Service since 2017..
>>>>
>> Yes, I am very familiar with these terms and aware of the EoL / EoS situation. 
>>
>> However, the HW still works (and actually works very well), it is still
>> supported even by the most recent Linux kernels, and for a non-prod use
>> cases like mine (retro workstation) it may be hard to make an argument that
>> it should not be used. Especially that it supports everything I need,
>> including PXE booting with iPXE, and BTW - the fact that Mellanox even
>> provided sources allowing to build / tweak the mrom is truly unique and
>> amazing.
>>
>> That said, I totally get that *if* this is a FW issues, getting a new
>> version is rather unlikely, even it used to be a "best-in-class"
>> premium NIC... 15 years ago? 😉 
>>
>>>> With QSFP module in port #2 (eth2), for the first attempt (ethtool -m eth2):
>>>> mlx4_core 0000:01:00.0: MLX4_CMD_MAD_IFC Get Module info attr(ff60) port(2) i2c_addr(50) offset(0) size(48): Response Mad Status(41c) - the connected cable has no EPROM (passive copper cable)
>>>> mlx4_en: eth2: mlx4_get_module_info i(0) offset(0) bytes_to_read(128) - FAILED (0xfffffbe4)
>>>>
>>>> However, if I first try run "ethtool -m eth1" with a SFP module installed in port #1, and then immediately "ethtool -m eth2", I end up getting the information for the SFP module:
>>>> # ethtool -m eth2
>>>>         Identifier                                : 0x03 (SFP)
>>>>         Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
>>>> (...)
>>>>
>>>> I this case, I even get the same "invalid I2C slave address" error:
>>>> mlx4_core 0000:01:00.0: MLX4_CMD_MAD_IFC Get Module info attr(ff60) port(2) i2c_addr(51) offset(0) size(48): Response Mad Status(71c) - invalid I2C slave address
>>>>
>>>> If I immediately run "ethtool -m eth1" I get:
>>>> mlx4_core 0000:01:00.0: MLX4_CMD_MAD_IFC Get Module info attr(ff60) port(1) i2c_addr(50) offset(224) size(32): Response Mad Status(61c) - invalid device_address or size (that is, size equals 0 or address+size is greater than 256)
>>>> mlx4_en: eth1: mlx4_get_module_info i(96) offset(224) bytes_to_read(32) - FAILED (0xfffff9e4)
>>>>
>>>> Alternatively, if I remove SFP module from port #1 and run "ethtool -m eth2", I get:
>>>> [ 1071.945737] mlx4_core 0000:01:00.0: MLX4_CMD_MAD_IFC Get Module ID attr(ff60) port(2) i2c_addr(50) offset(0) size(1): Response Mad Status(31c) - cable is not connected
>>>>
>>>> At this point, running "ethtool -m eth1" produces one of:
>>>>
>>>> *)
>>>>  mlx4_core 0000:01:00.0: MLX4_CMD_MAD_IFC Get Module ID attr(ff60) port(2) i2c_addr(50) offset(0) size(1): Response Mad Status(41c) - the connected cable has no EPROM (passive copper cable)
>>>>
>>>> *)
>>>>  mlx4_core 0000:01:00.0: MLX4_CMD_MAD_IFC Get Module info attr(ff60) port(2) i2c_addr(50) offset(128) size(48): Response Mad Status(41c) - the connected cable has no EPROM (passive copper cable)
>>>>  mlx4_en: eth2: mlx4_get_module_info i(0) offset(128) bytes_to_read(128) - FAILED (0xfffffbe4)
>>>>
>>>> *)
>>>>  mlx4_core 0000:01:00.0: MLX4_CMD_MAD_IFC Get Module ID attr(ff60) port(2) i2c_addr(50) offset(0) size(1): Response Mad Status(41c) - the connected cable has no EPROM (passive copper cable)
>>>>
>>>> *)
>>>>  mlx4_core 0000:01:00.0: MLX4_CMD_MAD_IFC Get Module info attr(ff60) port(2) i2c_addr(50) offset(0) size(48): Response Mad Status(41c) - the connected cable has no EPROM (passive copper cable)
>>>>  mlx4_en: eth2: mlx4_get_module_info i(0) offset(0) bytes_to_read(128) - FAILED (0xfffffbe4)
>>>>
>>>> *)
>>>>  mlx4_core 0000:01:00.0: MLX4_CMD_MAD_IFC Get Module ID attr(ff60) port(2) i2c_addr(50) offset(0) size(1): Response Mad Status(41c) - the connected cable has no EPROM (passive copper cable)
>>>>
>>>> I wonder if in this situation we are communicating with a wrong device or returning some stale data from kernel memory or the firmware?
>>>>
>>>> Thanks,
>>>>  Krzysztof
>>>>
>>>> [1]
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx4/port.c#n2200
>>>>
>>>>
>>>> [2]
>>>>         Identifier                                : 0x03 (SFP)
>>>>         Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
>>>> (...) 
>>>>         Optical diagnostics support               : Yes
>>>>         Laser bias current                        : 0.000 mA
>>>>         Laser output power                        : 0.0000 mW / -inf dBm
>>>>         Receiver signal average optical power     : 0.0000 mW / -inf dBm
>>>>         Module temperature                        : 0.00 degrees C / 32.00 degrees F
>>>>         Module voltage                            : 0.0000 V
>>>>         Alarm/warning flags implemented           : Yes
>>>>         Laser bias current high alarm             : Off
>>>>         Laser bias current low alarm              : Off
>>>>         Laser bias current high warning           : Off
>>>>         Laser bias current low warning            : Off
>>>>         Laser output power high alarm             : Off
>>>>         Laser output power low alarm              : Off
>>>>         Laser output power high warning           : Off
>>>>         Laser output power low warning            : Off
>>>>         Module temperature high alarm             : Off
>>>>         Module temperature low alarm              : Off
>>>>         Module temperature high warning           : Off
>>>>         Module temperature low warning            : Off
>>>>         Module voltage high alarm                 : Off
>>>>         Module voltage low alarm                  : Off
>>>>         Module voltage high warning               : Off
>>>>         Module voltage low warning                : Off
>>>>         Laser rx power high alarm                 : Off
>>>>         Laser rx power low alarm                  : Off
>>>>         Laser rx power high warning               : Off
>>>>         Laser rx power low warning                : Off
>>>>         Laser bias current high alarm threshold   : 0.000 mA
>>>>         Laser bias current low alarm threshold    : 0.000 mA
>>>>         Laser bias current high warning threshold : 0.000 mA
>>>>         Laser bias current low warning threshold  : 0.000 mA
>>>>         Laser output power high alarm threshold   : 0.0000 mW / -inf dBm
>>>>         Laser output power low alarm threshold    : 0.0000 mW / -inf dBm
>>>>         Laser output power high warning threshold : 0.0000 mW / -inf dBm
>>>>         Laser output power low warning threshold  : 0.0000 mW / -inf dBm
>>>>         Module temperature high alarm threshold   : 0.00 degrees C / 32.00 degrees F
>>>>         Module temperature low alarm threshold    : 0.00 degrees C / 32.00 degrees F
>>>>         Module temperature high warning threshold : 0.00 degrees C / 32.00 degrees F
>>>>         Module temperature low warning threshold  : 0.00 degrees C / 32.00 degrees F
>>>>         Module voltage high alarm threshold       : 0.0000 V
>>>>         Module voltage low alarm threshold        : 0.0000 V
>>>>         Module voltage high warning threshold     : 0.0000 V
>>>>         Module voltage low warning threshold      : 0.0000 V
>>>>         Laser rx power high alarm threshold       : 0.0000 mW / -inf dBm
>>>>         Laser rx power low alarm threshold        : 0.0000 mW / -inf dBm
>>>>         Laser rx power high warning threshold     : 0.0000 mW / -inf dBm
>>>>         Laser rx power low warning threshold      : 0.0000 mW / -inf dBm
>>>>
>>>> [3]
>>>> # ethtool -m eth1
>>>>         Identifier                                : 0x03 (SFP)
>>>>         Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
>>>>         Connector                                 : 0x07 (LC)
>>>>         Transceiver codes                         : 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
>>>>         Transceiver type                          : 10G Ethernet: 10G Base-SR
>>>>         Encoding                                  : 0x06 (64B/66B)
>>>>         BR, Nominal                               : 10300MBd
>>>>         Rate identifier                           : 0x00 (unspecified)
>>>>         Length (SMF,km)                           : 0km
>>>>         Length (SMF)                              : 0m
>>>>         Length (50um)                             : 80m
>>>>         Length (62.5um)                           : 30m
>>>>         Length (Copper)                           : 0m
>>>>         Length (OM3)                              : 300m
>>>>         Laser wavelength                          : 850nm
>>>>         Vendor name                               : IBM-Avago
>>>>         Vendor OUI                                : <REDACTED>
>>>>         Vendor PN                                 : <REDACTED>
>>>>         Vendor rev                                : G2.3
>>>>         Option values                             : 0x00 0x1a
>>>>         Option                                    : RX_LOS implemented
>>>>         Option                                    : TX_FAULT implemented
>>>>         Option                                    : TX_DISABLE implemented
>>>>         BR margin, max                            : 0%
>>>>         BR margin, min                            : 0%
>>>>         Vendor SN                                 : <REDACTED>
>>>>         Date code                                 : <REDACTED>
>>>> netlink error: Unknown error 1820
>>
>> Thanks,
>>  Krzysztof
>>
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ