[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <82CEAF9FFBA4DD428B132074FB91DF7D5F6481EA@CSI-MAILSRV.csicompanies.internal>
Date: Wed, 26 Sep 2018 19:29:23 +0000
From: Chris Preimesberger <chrisp@...nsition.com>
To: "linville@...driver.com" <linville@...driver.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: bug: 'ethtool -m' reports spurious alarm & warning threshold
values for QSFP28 transceivers
Hello,
I'm re-sending in plain text per the auto-reply from a spam filter. I have attached some text files this time, which explain the situation below, in case the below email's font & formatting is now too messed up for easy comprehension.
Thank you and best regards.
Chris Preimesberger | Test & Validation Engineer
Transition Networks, Inc.
chrisp@...nsition.com
direct: +1.952.996.1509 | fax: +1.952.941.2322 | www.transition.com
________________________________________
From: Chris Preimesberger
Sent: Wednesday, September 26, 2018 2:14 PM
To: 'linville@...driver.com'; 'netdev@...r.kernel.org'
Subject: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers
Hello John, All,
I think I may have found a bug or two in ethtool, with respect to its reporting of a QSFP28 transceiver's diagnostic information. Ethtool seems to correctly report all diagnostic information about QSFP28 transceivers, except for the transceiver's warning and alarm thresholds. I'm not sure whether the spurious warning and alarm values that get reported are the fault of ethtool or my NIC/driver, and I have no other models of 100GbE NICs to test with. I've contacted Mellanox support about this, and they point the finger at ethtool. Can these issues be investigated by ethtool developers? Here is some background information about the equipment and software used when I observe these issues:
Equipment used:
NIC: Mellanox ConnectX-4 100GbE, part number MCX415A-CCAT
Transceiver: Any 40Gb or 100Gb QSFP28 transceiver installed in the NIC (Intel, Mellanox, Transition Networks, etc..)
Software used:
Ubuntu 18.04 with the distro's packaged NIC driver and ethtool v4.15
also tested were ethtool v4.18 compiled from source and the current Mellanox OFED driver.
All test scenarios produced the same bugs.
Bug #1. Ethtool's reporting of the installed transceiver's alarm and warning thresholds will differ, depending on whether or not ethtool is piped to another command. Example commands are below, with their respective differing output values highlighted:
tech1@D8:~$ sudo ethtool -m enp1s0
Identifier : 0x11 (QSFP28)
Extended identifier : 0xfc
Extended identifier description : 3.5W max. Power consumption
Extended identifier description : CDR present in TX, CDR present in RX
Extended identifier description : High Power Class (> 3.5 W) not enabled
Connector : 0x07 (LC)
Transceiver codes : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Transceiver type : 100G Ethernet: 100G CWDM4 MSA with FEC
Encoding : 0x03 (NRZ)
BR, Nominal : 25500Mbps
Rate identifier : 0x00
Length (SMF,km) : 2km
Length (OM3 50um) : 0m
Length (OM2 50um) : 0m
Length (OM1 62.5um) : 0m
Length (Copper or Active cable) : 0m
Transmitter technology : 0x40 (1310 nm DFB)
Laser wavelength : 1310.000nm
Laser wavelength tolerance : 47.500nm
Vendor name : TRANSITION
Vendor OUI : 00:c0:f2
Vendor PN : TNQSFP100GCWDM4
Vendor rev : 1A
Vendor SN : TN02000302
Date code : 180919
Revision Compliance : SFF-8636 Rev 2.5/2.6/2.7
Module temperature : 39.53 degrees C / 103.15 degrees F
Module voltage : 3.3241 V
Alarm/warning flags implemented : Yes
Laser tx bias current (Channel 1) : 34.432 mA
Laser tx bias current (Channel 2) : 34.432 mA
Laser tx bias current (Channel 3) : 33.408 mA
Laser tx bias current (Channel 4) : 33.920 mA
Transmit avg optical power (Channel 1) : 0.9048 mW / -0.43 dBm
Transmit avg optical power (Channel 2) : 0.7832 mW / -1.06 dBm
Transmit avg optical power (Channel 3) : 0.8057 mW / -0.94 dBm
Transmit avg optical power (Channel 4) : 0.7014 mW / -1.54 dBm
Rcvr signal avg optical power(Channel 1) : 0.7378 mW / -1.32 dBm
Rcvr signal avg optical power(Channel 2) : 0.7553 mW / -1.22 dBm
Rcvr signal avg optical power(Channel 3) : 0.6529 mW / -1.85 dBm
Rcvr signal avg optical power(Channel 4) : 0.6847 mW / -1.64 dBm
Laser bias current high alarm (Chan 1) : Off
Laser bias current low alarm (Chan 1) : Off
Laser bias current high warning (Chan 1) : Off
Laser bias current low warning (Chan 1) : Off
Laser bias current high alarm (Chan 2) : Off
Laser bias current low alarm (Chan 2) : Off
Laser bias current high warning (Chan 2) : Off
Laser bias current low warning (Chan 2) : Off
Laser bias current high alarm (Chan 3) : Off
Laser bias current low alarm (Chan 3) : Off
Laser bias current high warning (Chan 3) : Off
Laser bias current low warning (Chan 3) : Off
Laser bias current high alarm (Chan 4) : Off
Laser bias current low alarm (Chan 4) : Off
Laser bias current high warning (Chan 4) : Off
Laser bias current low warning (Chan 4) : Off
Module temperature high alarm : Off
Module temperature low alarm : Off
Module temperature high warning : Off
Module temperature low warning : Off
Module voltage high alarm : Off
Module voltage low alarm : Off
Module voltage high warning : Off
Module voltage low warning : Off
Laser tx power high alarm (Channel 1) : Off
Laser tx power low alarm (Channel 1) : Off
Laser tx power high warning (Channel 1) : Off
Laser tx power low warning (Channel 1) : Off
Laser tx power high alarm (Channel 2) : Off
Laser tx power low alarm (Channel 2) : Off
Laser tx power high warning (Channel 2) : Off
Laser tx power low warning (Channel 2) : Off
Laser tx power high alarm (Channel 3) : Off
Laser tx power low alarm (Channel 3) : Off
Laser tx power high warning (Channel 3) : Off
Laser tx power low warning (Channel 3) : Off
Laser tx power high alarm (Channel 4) : Off
Laser tx power low alarm (Channel 4) : Off
Laser tx power high warning (Channel 4) : Off
Laser tx power low warning (Channel 4) : Off
Laser rx power high alarm (Channel 1) : Off
Laser rx power low alarm (Channel 1) : Off
Laser rx power high warning (Channel 1) : Off
Laser rx power low warning (Channel 1) : Off
Laser rx power high alarm (Channel 2) : Off
Laser rx power low alarm (Channel 2) : Off
Laser rx power high warning (Channel 2) : Off
Laser rx power low warning (Channel 2) : Off
Laser rx power high alarm (Channel 3) : Off
Laser rx power low alarm (Channel 3) : Off
Laser rx power high warning (Channel 3) : Off
Laser rx power low warning (Channel 3) : Off
Laser rx power high alarm (Channel 4) : Off
Laser rx power low alarm (Channel 4) : Off
Laser rx power high warning (Channel 4) : Off
Laser rx power low warning (Channel 4) : Off
Laser bias current high alarm threshold : 0.000 mA
Laser bias current low alarm threshold : 0.000 mA
Laser bias current high warning threshold : 0.000 mA
Laser bias current low warning threshold : 0.000 mA
Laser output power high alarm threshold : 0.0000 mW / -inf dBm
Laser output power low alarm threshold : 0.0000 mW / -inf dBm
Laser output power high warning threshold : 0.0000 mW / -inf dBm
Laser output power low warning threshold : 0.0000 mW / -inf dBm
Module temperature high alarm threshold : 0.00 degrees C / 32.00 degrees F
Module temperature low alarm threshold : 0.00 degrees C / 32.00 degrees F
Module temperature high warning threshold : 0.00 degrees C / 32.00 degrees F
Module temperature low warning threshold : 0.00 degrees C / 32.00 degrees F
Module voltage high alarm threshold : 0.0000 V
Module voltage low alarm threshold : 0.0000 V
Module voltage high warning threshold : 0.0000 V
Module voltage low warning threshold : 0.0000 V
Laser rx power high alarm threshold : 0.0000 mW / -inf dBm
Laser rx power low alarm threshold : 0.0000 mW / -inf dBm
Laser rx power high warning threshold : 0.0000 mW / -inf dBm
Laser rx power low warning threshold : 0.0000 mW / -inf dBm
tech1@D8:~$ sudo ethtool -m enp1s0 | cat
Identifier : 0x11 (QSFP28)
Extended identifier : 0xfc
Extended identifier description : 3.5W max. Power consumption
Extended identifier description : CDR present in TX, CDR present in RX
Extended identifier description : High Power Class (> 3.5 W) not enabled
Connector : 0x07 (LC)
Transceiver codes : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Transceiver type : 100G Ethernet: 100G CWDM4 MSA with FEC
Encoding : 0x03 (NRZ)
BR, Nominal : 25500Mbps
Rate identifier : 0x00
Length (SMF,km) : 2km
Length (OM3 50um) : 0m
Length (OM2 50um) : 0m
Length (OM1 62.5um) : 0m
Length (Copper or Active cable) : 0m
Transmitter technology : 0x40 (1310 nm DFB)
Laser wavelength : 1310.000nm
Laser wavelength tolerance : 47.500nm
Vendor name : TRANSITION
Vendor OUI : 00:c0:f2
Vendor PN : TNQSFP100GCWDM4
Vendor rev : 1A
Vendor SN : TN02000302
Date code : 180919
Revision Compliance : SFF-8636 Rev 2.5/2.6/2.7
Module temperature : 39.53 degrees C / 103.15 degrees F
Module voltage : 3.3249 V
Alarm/warning flags implemented : Yes
Laser tx bias current (Channel 1) : 34.432 mA
Laser tx bias current (Channel 2) : 34.432 mA
Laser tx bias current (Channel 3) : 33.408 mA
Laser tx bias current (Channel 4) : 33.920 mA
Transmit avg optical power (Channel 1) : 0.9043 mW / -0.44 dBm
Transmit avg optical power (Channel 2) : 0.7832 mW / -1.06 dBm
Transmit avg optical power (Channel 3) : 0.8057 mW / -0.94 dBm
Transmit avg optical power (Channel 4) : 0.7009 mW / -1.54 dBm
Rcvr signal avg optical power(Channel 1) : 0.7378 mW / -1.32 dBm
Rcvr signal avg optical power(Channel 2) : 0.7553 mW / -1.22 dBm
Rcvr signal avg optical power(Channel 3) : 0.6529 mW / -1.85 dBm
Rcvr signal avg optical power(Channel 4) : 0.6847 mW / -1.64 dBm
Laser bias current high alarm (Chan 1) : Off
Laser bias current low alarm (Chan 1) : Off
Laser bias current high warning (Chan 1) : Off
Laser bias current low warning (Chan 1) : Off
Laser bias current high alarm (Chan 2) : Off
Laser bias current low alarm (Chan 2) : Off
Laser bias current high warning (Chan 2) : Off
Laser bias current low warning (Chan 2) : Off
Laser bias current high alarm (Chan 3) : Off
Laser bias current low alarm (Chan 3) : Off
Laser bias current high warning (Chan 3) : Off
Laser bias current low warning (Chan 3) : Off
Laser bias current high alarm (Chan 4) : Off
Laser bias current low alarm (Chan 4) : Off
Laser bias current high warning (Chan 4) : Off
Laser bias current low warning (Chan 4) : Off
Module temperature high alarm : Off
Module temperature low alarm : Off
Module temperature high warning : Off
Module temperature low warning : Off
Module voltage high alarm : Off
Module voltage low alarm : Off
Module voltage high warning : Off
Module voltage low warning : Off
Laser tx power high alarm (Channel 1) : Off
Laser tx power low alarm (Channel 1) : Off
Laser tx power high warning (Channel 1) : Off
Laser tx power low warning (Channel 1) : Off
Laser tx power high alarm (Channel 2) : Off
Laser tx power low alarm (Channel 2) : Off
Laser tx power high warning (Channel 2) : Off
Laser tx power low warning (Channel 2) : Off
Laser tx power high alarm (Channel 3) : Off
Laser tx power low alarm (Channel 3) : Off
Laser tx power high warning (Channel 3) : Off
Laser tx power low warning (Channel 3) : Off
Laser tx power high alarm (Channel 4) : Off
Laser tx power low alarm (Channel 4) : Off
Laser tx power high warning (Channel 4) : Off
Laser tx power low warning (Channel 4) : Off
Laser rx power high alarm (Channel 1) : Off
Laser rx power low alarm (Channel 1) : Off
Laser rx power high warning (Channel 1) : Off
Laser rx power low warning (Channel 1) : Off
Laser rx power high alarm (Channel 2) : Off
Laser rx power low alarm (Channel 2) : Off
Laser rx power high warning (Channel 2) : Off
Laser rx power low warning (Channel 2) : Off
Laser rx power high alarm (Channel 3) : Off
Laser rx power low alarm (Channel 3) : Off
Laser rx power high warning (Channel 3) : Off
Laser rx power low warning (Channel 3) : Off
Laser rx power high alarm (Channel 4) : Off
Laser rx power low alarm (Channel 4) : Off
Laser rx power high warning (Channel 4) : Off
Laser rx power low warning (Channel 4) : Off
Laser bias current high alarm threshold : 16.448 mA
Laser bias current low alarm threshold : 16.448 mA
Laser bias current high warning threshold : 16.448 mA
Laser bias current low warning threshold : 16.448 mA
Laser output power high alarm threshold : 0.8224 mW / -0.85 dBm
Laser output power low alarm threshold : 0.8250 mW / -0.84 dBm
Laser output power high warning threshold : 0.8264 mW / -0.83 dBm
Laser output power low warning threshold : 2.6983 mW / 4.31 dBm
Module temperature high alarm threshold : 110.12 degrees C / 230.22 degrees F
Module temperature low alarm threshold : 84.34 degrees C / 183.82 degrees F
Module temperature high warning threshold : 44.12 degrees C / 111.42 degrees F
Module temperature low warning threshold : 67.27 degrees C / 153.08 degrees F
Module voltage high alarm threshold : 2.9728 V
Module voltage low alarm threshold : 2.6990 V
Module voltage high warning threshold : 0.8274 V
Module voltage low warning threshold : 2.2538 V
Laser rx power high alarm threshold : 2.5458 mW / 4.06 dBm
Laser rx power low alarm threshold : 2.6992 mW / 4.31 dBm
Laser rx power high warning threshold : 2.9801 mW / 4.74 dBm
Laser rx power low warning threshold : 2.8526 mW / 4.55 dBm
Bug # 2. All of the alarm and warning threshold values reported in the above commands are spurious.
At first glance, one would assume that the threshold values reported by the piped ethtool command are correct, but they're not. I know the programmed values for the above transceiver, so that makes it easy for me to spot the spurious values, but even without knowing the programmed values of a given transceiver, one can use logic to detect when the ethtool displayed values don't make sense.
For example, lets scrutinize the values for voltage warnings and alarms reported by ethtool on this transceiver. We will look at each voltage threshold, and scrutinize that value relative to the other voltage thresholds, and look for contradictions to determine whether the reported values seem legit.
Known ethtool
Actual Reported
Values Values
High Voltage Alarm 3.70V 2.9728 V
High Voltage Warning 3.59V 0.8274 V
(Operating spec = 3.30V)
Low Voltage Warning 3.00V 2.2538 V
Low Voltage Alarm 2.90V 2.6990 V
Contradictions for the ethtool reported voltage warning and alarm thresholds:
1. The high voltage alarm should occur at higher voltage than the operating voltage, but ethtool didn't report that.
2. The high voltage warning should occur at higher voltage than the low voltage warning and alarm, but ethtool didn't report that.
3. The low voltage warning should occur at higher voltage than the low voltage alarm, but ethtool didn't report that.
4. The low voltage alarm should occur at a lower voltage than any of the other voltage warnings and alarms, but ethtool didn't report that.
5. The current voltage value was reported as 3.3249V, which should trigger high voltage warning and alarm, according to the reported thresholds, but no warnings or alarms are indicated.
Each of the 4 voltage thresholds reported by ethtool have contradictions, so we know something is not right. This same kind of logic can be applied to the thresholds for temperature, laser TX power, etc.. to find that those values are also spurious.
Installing the above transceiver in a Cisco switch reveals that the Cisco correctly retrieves the true warning and alarm threshold values from the transceiver's EEPROM, so we trust that the transceiver has been correctly programmed. Cisco CLI output for that transceiver shown here:
switch# show interface ethernet 1/3 transceiver details
Ethernet1/3
transceiver is present
type is QSFP-100G-CWDM4-MSA-FEC
name is TRANSITION
part number is TNQSFP100GCWDM4
revision is 1A
serial number is TN02000302
nominal bitrate is 25500 MBit/sec per channel
Link length supported for 9/125um fiber is 2 km
cisco id is 17
cisco extended id number is 252
Lane Number:1 Network Lane
SFP Detail Diagnostics Information (internal calibration)
----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low
----------------------------------------------------------------------------
Temperature 38.08 C 80.00 C -10.00 C 75.00 C -5.00 C
Voltage 3.34 V 3.70 V 2.90 V 3.59 V 3.00 V
Current 34.24 mA 75.00 mA 10.00 mA 70.00 mA 15.00 mA
Tx Power -0.44 dBm 4.49 dBm -8.50 dBm 3.49 dBm -7.52 dBm
Rx Power N/A 4.49 dBm -14.55 dBm 3.49 dBm -12.51 dBm
Transmit Fault Count = 0
----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
Lane Number:2 Network Lane
SFP Detail Diagnostics Information (internal calibration)
----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low
----------------------------------------------------------------------------
Temperature 38.08 C 80.00 C -10.00 C 75.00 C -5.00 C
Voltage 3.34 V 3.70 V 2.90 V 3.59 V 3.00 V
Current 34.24 mA 75.00 mA 10.00 mA 70.00 mA 15.00 mA
Tx Power -1.20 dBm 4.49 dBm -8.50 dBm 3.49 dBm -7.52 dBm
Rx Power N/A 4.49 dBm -14.55 dBm 3.49 dBm -12.51 dBm
Transmit Fault Count = 0
----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
Lane Number:3 Network Lane
SFP Detail Diagnostics Information (internal calibration)
----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low
----------------------------------------------------------------------------
Temperature 38.08 C 80.00 C -10.00 C 75.00 C -5.00 C
Voltage 3.34 V 3.70 V 2.90 V 3.59 V 3.00 V
Current 33.21 mA 75.00 mA 10.00 mA 70.00 mA 15.00 mA
Tx Power -0.96 dBm 4.49 dBm -8.50 dBm 3.49 dBm -7.52 dBm
Rx Power N/A 4.49 dBm -14.55 dBm 3.49 dBm -12.51 dBm
Transmit Fault Count = 0
----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
Lane Number:4 Network Lane
SFP Detail Diagnostics Information (internal calibration)
----------------------------------------------------------------------------
Current Alarms Warnings
Measurement High Low High Low
----------------------------------------------------------------------------
Temperature 38.08 C 80.00 C -10.00 C 75.00 C -5.00 C
Voltage 3.34 V 3.70 V 2.90 V 3.59 V 3.00 V
Current 33.72 mA 75.00 mA 10.00 mA 70.00 mA 15.00 mA
Tx Power -1.59 dBm 4.49 dBm -8.50 dBm 3.49 dBm -7.52 dBm
Rx Power N/A 4.49 dBm -14.55 dBm 3.49 dBm -12.51 dBm
Transmit Fault Count = 0
----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
switch#
Any help with these issues is greatly appreciated. If you have any questions or advice, please let me know. I'll be glad to continue troubleshooting this until it's resolved. Thank you.
Chris Preimesberger | Test & Validation Engineer
Transition Networks, Inc.
chrisp@...nsition.com
direct: +1.952.996.1509 | fax: +1.952.941.2322 | www.transition.com
________________________________________
View attachment "ethtoolQSFP28thresholdsCiscoComparison.txt" of type "text/plain" (4602 bytes)
View attachment "ethtoolQSFP28thresholdsExpectedOutput.txt" of type "text/plain" (7258 bytes)
View attachment "ethtoolQSFP28thresholdsSpuriousOutput1of2.txt" of type "text/plain" (6843 bytes)
View attachment "ethtoolQSFP28thresholdsSpuriousOutput2of2.txt" of type "text/plain" (6866 bytes)
Powered by blists - more mailing lists