lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <82CEAF9FFBA4DD428B132074FB91DF7D5F6481EA@CSI-MAILSRV.csicompanies.internal>
Date:   Wed, 26 Sep 2018 19:29:23 +0000
From:   Chris Preimesberger <chrisp@...nsition.com>
To:     "linville@...driver.com" <linville@...driver.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: bug: 'ethtool -m' reports spurious alarm & warning threshold
 values for QSFP28 transceivers

Hello,

I'm re-sending in plain text per the auto-reply from a spam filter.  I have attached some text files this time, which explain the situation below, in case the below email's font & formatting is now too messed up for easy comprehension.

Thank you and best regards.


Chris Preimesberger | Test & Validation Engineer
Transition Networks, Inc.

chrisp@...nsition.com
direct: +1.952.996.1509 | fax: +1.952.941.2322 | www.transition.com
________________________________________





From: Chris Preimesberger 
Sent: Wednesday, September 26, 2018 2:14 PM
To: 'linville@...driver.com'; 'netdev@...r.kernel.org'
Subject: bug: 'ethtool -m' reports spurious alarm & warning threshold values for QSFP28 transceivers

Hello John, All,


I think I may have found a bug or two in ethtool, with respect to its reporting of a QSFP28 transceiver's diagnostic information.  Ethtool seems to correctly report all diagnostic information about QSFP28 transceivers, except for the transceiver's warning and alarm thresholds.  I'm not sure whether the spurious warning and alarm values that get reported are the fault of ethtool or my NIC/driver, and I have no other models of 100GbE NICs to test with.  I've contacted Mellanox support about this, and they point the finger at ethtool.  Can these issues be investigated by ethtool developers?  Here is some background information about the equipment and software used when I observe these issues:

Equipment used:
NIC: Mellanox ConnectX-4 100GbE, part number MCX415A-CCAT
Transceiver: Any 40Gb or 100Gb QSFP28 transceiver installed in the NIC (Intel, Mellanox, Transition Networks, etc..)

Software used:
Ubuntu 18.04 with the distro's packaged NIC driver and ethtool v4.15
also tested were ethtool v4.18 compiled from source and the current Mellanox OFED driver.

All test scenarios produced the same bugs.


Bug #1.  Ethtool's reporting of the installed transceiver's alarm and warning thresholds will differ, depending on whether or not ethtool is piped to another command.  Example commands are below, with their respective differing output values highlighted:


tech1@D8:~$ sudo ethtool -m enp1s0
        Identifier                                : 0x11 (QSFP28)
        Extended identifier                       : 0xfc
        Extended identifier description           : 3.5W max. Power consumption
        Extended identifier description           : CDR present in TX, CDR present in RX
        Extended identifier description           : High Power Class (> 3.5 W) not enabled
        Connector                                 : 0x07 (LC)
        Transceiver codes                         : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
        Transceiver type                          : 100G Ethernet: 100G CWDM4 MSA with FEC
        Encoding                                  : 0x03 (NRZ)
        BR, Nominal                               : 25500Mbps
        Rate identifier                           : 0x00
        Length (SMF,km)                           : 2km
        Length (OM3 50um)                         : 0m
        Length (OM2 50um)                         : 0m
        Length (OM1 62.5um)                       : 0m
        Length (Copper or Active cable)           : 0m
        Transmitter technology                    : 0x40 (1310 nm DFB)
        Laser wavelength                          : 1310.000nm
        Laser wavelength tolerance                : 47.500nm
        Vendor name                               : TRANSITION
        Vendor OUI                                : 00:c0:f2
        Vendor PN                                 : TNQSFP100GCWDM4
        Vendor rev                                : 1A
        Vendor SN                                 : TN02000302
        Date code                                 : 180919
        Revision Compliance                       : SFF-8636 Rev 2.5/2.6/2.7
        Module temperature                        : 39.53 degrees C / 103.15 degrees F
        Module voltage                            : 3.3241 V
        Alarm/warning flags implemented           : Yes
        Laser tx bias current (Channel 1)         : 34.432 mA
        Laser tx bias current (Channel 2)         : 34.432 mA
        Laser tx bias current (Channel 3)         : 33.408 mA
        Laser tx bias current (Channel 4)         : 33.920 mA
        Transmit avg optical power (Channel 1)    : 0.9048 mW / -0.43 dBm
        Transmit avg optical power (Channel 2)    : 0.7832 mW / -1.06 dBm
        Transmit avg optical power (Channel 3)    : 0.8057 mW / -0.94 dBm
        Transmit avg optical power (Channel 4)    : 0.7014 mW / -1.54 dBm
        Rcvr signal avg optical power(Channel 1)  : 0.7378 mW / -1.32 dBm
        Rcvr signal avg optical power(Channel 2)  : 0.7553 mW / -1.22 dBm
        Rcvr signal avg optical power(Channel 3)  : 0.6529 mW / -1.85 dBm
        Rcvr signal avg optical power(Channel 4)  : 0.6847 mW / -1.64 dBm
        Laser bias current high alarm   (Chan 1)  : Off
        Laser bias current low alarm    (Chan 1)  : Off
        Laser bias current high warning (Chan 1)  : Off
        Laser bias current low warning  (Chan 1)  : Off
        Laser bias current high alarm   (Chan 2)  : Off
        Laser bias current low alarm    (Chan 2)  : Off
        Laser bias current high warning (Chan 2)  : Off
        Laser bias current low warning  (Chan 2)  : Off
        Laser bias current high alarm   (Chan 3)  : Off
        Laser bias current low alarm    (Chan 3)  : Off
        Laser bias current high warning (Chan 3)  : Off
        Laser bias current low warning  (Chan 3)  : Off
        Laser bias current high alarm   (Chan 4)  : Off
        Laser bias current low alarm    (Chan 4)  : Off
        Laser bias current high warning (Chan 4)  : Off
        Laser bias current low warning  (Chan 4)  : Off
        Module temperature high alarm             : Off
        Module temperature low alarm              : Off
        Module temperature high warning           : Off
        Module temperature low warning            : Off
        Module voltage high alarm                 : Off
        Module voltage low alarm                  : Off
        Module voltage high warning               : Off
        Module voltage low warning                : Off
        Laser tx power high alarm   (Channel 1)   : Off
        Laser tx power low alarm    (Channel 1)   : Off
        Laser tx power high warning (Channel 1)   : Off
        Laser tx power low warning  (Channel 1)   : Off
        Laser tx power high alarm   (Channel 2)   : Off
        Laser tx power low alarm    (Channel 2)   : Off
        Laser tx power high warning (Channel 2)   : Off
        Laser tx power low warning  (Channel 2)   : Off
        Laser tx power high alarm   (Channel 3)   : Off
        Laser tx power low alarm    (Channel 3)   : Off
        Laser tx power high warning (Channel 3)   : Off
        Laser tx power low warning  (Channel 3)   : Off
        Laser tx power high alarm   (Channel 4)   : Off
        Laser tx power low alarm    (Channel 4)   : Off
        Laser tx power high warning (Channel 4)   : Off
        Laser tx power low warning  (Channel 4)   : Off
        Laser rx power high alarm   (Channel 1)   : Off
        Laser rx power low alarm    (Channel 1)   : Off
        Laser rx power high warning (Channel 1)   : Off
        Laser rx power low warning  (Channel 1)   : Off
        Laser rx power high alarm   (Channel 2)   : Off
        Laser rx power low alarm    (Channel 2)   : Off
        Laser rx power high warning (Channel 2)   : Off
        Laser rx power low warning  (Channel 2)   : Off
        Laser rx power high alarm   (Channel 3)   : Off
        Laser rx power low alarm    (Channel 3)   : Off
        Laser rx power high warning (Channel 3)   : Off
        Laser rx power low warning  (Channel 3)   : Off
        Laser rx power high alarm   (Channel 4)   : Off
        Laser rx power low alarm    (Channel 4)   : Off
        Laser rx power high warning (Channel 4)   : Off
        Laser rx power low warning  (Channel 4)   : Off
        Laser bias current high alarm threshold   : 0.000 mA
        Laser bias current low alarm threshold    : 0.000 mA
        Laser bias current high warning threshold : 0.000 mA
        Laser bias current low warning threshold  : 0.000 mA
        Laser output power high alarm threshold   : 0.0000 mW / -inf dBm
        Laser output power low alarm threshold    : 0.0000 mW / -inf dBm
        Laser output power high warning threshold : 0.0000 mW / -inf dBm
        Laser output power low warning threshold  : 0.0000 mW / -inf dBm
        Module temperature high alarm threshold   : 0.00 degrees C / 32.00 degrees F
        Module temperature low alarm threshold    : 0.00 degrees C / 32.00 degrees F
        Module temperature high warning threshold : 0.00 degrees C / 32.00 degrees F
        Module temperature low warning threshold  : 0.00 degrees C / 32.00 degrees F
        Module voltage high alarm threshold       : 0.0000 V
        Module voltage low alarm threshold        : 0.0000 V
        Module voltage high warning threshold     : 0.0000 V
        Module voltage low warning threshold      : 0.0000 V
        Laser rx power high alarm threshold       : 0.0000 mW / -inf dBm
        Laser rx power low alarm threshold        : 0.0000 mW / -inf dBm
        Laser rx power high warning threshold     : 0.0000 mW / -inf dBm
        Laser rx power low warning threshold      : 0.0000 mW / -inf dBm


tech1@D8:~$ sudo ethtool -m enp1s0 | cat
        Identifier                                : 0x11 (QSFP28)
        Extended identifier                       : 0xfc
        Extended identifier description           : 3.5W max. Power consumption
        Extended identifier description           : CDR present in TX, CDR present in RX
        Extended identifier description           : High Power Class (> 3.5 W) not enabled
        Connector                                 : 0x07 (LC)
        Transceiver codes                         : 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
        Transceiver type                          : 100G Ethernet: 100G CWDM4 MSA with FEC
        Encoding                                  : 0x03 (NRZ)
        BR, Nominal                               : 25500Mbps
        Rate identifier                           : 0x00
        Length (SMF,km)                           : 2km
        Length (OM3 50um)                         : 0m
        Length (OM2 50um)                         : 0m
        Length (OM1 62.5um)                       : 0m
        Length (Copper or Active cable)           : 0m
        Transmitter technology                    : 0x40 (1310 nm DFB)
        Laser wavelength                          : 1310.000nm
        Laser wavelength tolerance                : 47.500nm
        Vendor name                               : TRANSITION
        Vendor OUI                                : 00:c0:f2
        Vendor PN                                 : TNQSFP100GCWDM4
        Vendor rev                                : 1A
        Vendor SN                                 : TN02000302
        Date code                                 : 180919
        Revision Compliance                       : SFF-8636 Rev 2.5/2.6/2.7
        Module temperature                        : 39.53 degrees C / 103.15 degrees F
        Module voltage                            : 3.3249 V
        Alarm/warning flags implemented           : Yes
        Laser tx bias current (Channel 1)         : 34.432 mA
        Laser tx bias current (Channel 2)         : 34.432 mA
        Laser tx bias current (Channel 3)         : 33.408 mA
        Laser tx bias current (Channel 4)         : 33.920 mA
        Transmit avg optical power (Channel 1)    : 0.9043 mW / -0.44 dBm
        Transmit avg optical power (Channel 2)    : 0.7832 mW / -1.06 dBm
        Transmit avg optical power (Channel 3)    : 0.8057 mW / -0.94 dBm
        Transmit avg optical power (Channel 4)    : 0.7009 mW / -1.54 dBm
        Rcvr signal avg optical power(Channel 1)  : 0.7378 mW / -1.32 dBm
        Rcvr signal avg optical power(Channel 2)  : 0.7553 mW / -1.22 dBm
        Rcvr signal avg optical power(Channel 3)  : 0.6529 mW / -1.85 dBm
        Rcvr signal avg optical power(Channel 4)  : 0.6847 mW / -1.64 dBm
        Laser bias current high alarm   (Chan 1)  : Off
        Laser bias current low alarm    (Chan 1)  : Off
        Laser bias current high warning (Chan 1)  : Off
        Laser bias current low warning  (Chan 1)  : Off
        Laser bias current high alarm   (Chan 2)  : Off
        Laser bias current low alarm    (Chan 2)  : Off
        Laser bias current high warning (Chan 2)  : Off
        Laser bias current low warning  (Chan 2)  : Off
        Laser bias current high alarm   (Chan 3)  : Off
        Laser bias current low alarm    (Chan 3)  : Off
        Laser bias current high warning (Chan 3)  : Off
        Laser bias current low warning  (Chan 3)  : Off
        Laser bias current high alarm   (Chan 4)  : Off
        Laser bias current low alarm    (Chan 4)  : Off
        Laser bias current high warning (Chan 4)  : Off
        Laser bias current low warning  (Chan 4)  : Off
        Module temperature high alarm             : Off
        Module temperature low alarm              : Off
        Module temperature high warning           : Off
        Module temperature low warning            : Off
        Module voltage high alarm                 : Off
        Module voltage low alarm                  : Off
        Module voltage high warning               : Off
        Module voltage low warning                : Off
        Laser tx power high alarm   (Channel 1)   : Off
        Laser tx power low alarm    (Channel 1)   : Off
        Laser tx power high warning (Channel 1)   : Off
        Laser tx power low warning  (Channel 1)   : Off
        Laser tx power high alarm   (Channel 2)   : Off
        Laser tx power low alarm    (Channel 2)   : Off
        Laser tx power high warning (Channel 2)   : Off
        Laser tx power low warning  (Channel 2)   : Off
        Laser tx power high alarm   (Channel 3)   : Off
        Laser tx power low alarm    (Channel 3)   : Off
        Laser tx power high warning (Channel 3)   : Off
        Laser tx power low warning  (Channel 3)   : Off
        Laser tx power high alarm   (Channel 4)   : Off
        Laser tx power low alarm    (Channel 4)   : Off
        Laser tx power high warning (Channel 4)   : Off
        Laser tx power low warning  (Channel 4)   : Off
        Laser rx power high alarm   (Channel 1)   : Off
        Laser rx power low alarm    (Channel 1)   : Off
        Laser rx power high warning (Channel 1)   : Off
        Laser rx power low warning  (Channel 1)   : Off
        Laser rx power high alarm   (Channel 2)   : Off
        Laser rx power low alarm    (Channel 2)   : Off
        Laser rx power high warning (Channel 2)   : Off
        Laser rx power low warning  (Channel 2)   : Off
        Laser rx power high alarm   (Channel 3)   : Off
        Laser rx power low alarm    (Channel 3)   : Off
        Laser rx power high warning (Channel 3)   : Off
        Laser rx power low warning  (Channel 3)   : Off
        Laser rx power high alarm   (Channel 4)   : Off
        Laser rx power low alarm    (Channel 4)   : Off
        Laser rx power high warning (Channel 4)   : Off
        Laser rx power low warning  (Channel 4)   : Off
        Laser bias current high alarm threshold   : 16.448 mA
        Laser bias current low alarm threshold    : 16.448 mA
        Laser bias current high warning threshold : 16.448 mA
        Laser bias current low warning threshold  : 16.448 mA
        Laser output power high alarm threshold   : 0.8224 mW / -0.85 dBm
        Laser output power low alarm threshold    : 0.8250 mW / -0.84 dBm
        Laser output power high warning threshold : 0.8264 mW / -0.83 dBm
        Laser output power low warning threshold  : 2.6983 mW / 4.31 dBm
        Module temperature high alarm threshold   : 110.12 degrees C / 230.22 degrees F
        Module temperature low alarm threshold    : 84.34 degrees C / 183.82 degrees F
        Module temperature high warning threshold : 44.12 degrees C / 111.42 degrees F
        Module temperature low warning threshold  : 67.27 degrees C / 153.08 degrees F
        Module voltage high alarm threshold       : 2.9728 V
        Module voltage low alarm threshold        : 2.6990 V
        Module voltage high warning threshold     : 0.8274 V
        Module voltage low warning threshold      : 2.2538 V
        Laser rx power high alarm threshold       : 2.5458 mW / 4.06 dBm
        Laser rx power low alarm threshold        : 2.6992 mW / 4.31 dBm
        Laser rx power high warning threshold     : 2.9801 mW / 4.74 dBm
        Laser rx power low warning threshold      : 2.8526 mW / 4.55 dBm


Bug # 2. All of the alarm and warning threshold values reported in the above commands are spurious.
At first glance, one would assume that the threshold values reported by the piped ethtool command are correct, but they're not.  I know the programmed values for the above transceiver, so that makes it easy for me to spot the spurious values, but even without knowing the programmed values of a given transceiver, one can use logic to detect when the ethtool displayed values don't make sense.
For example, lets scrutinize the values for voltage warnings and alarms reported by ethtool on this transceiver.  We will look at each voltage threshold, and scrutinize that value relative to the other voltage thresholds, and look for contradictions to determine whether the reported values seem legit.  
                                Known           ethtool
                                Actual          Reported
         Values          Values
High Voltage Alarm              3.70V           2.9728 V
High Voltage Warning            3.59V           0.8274 V
(Operating spec = 3.30V)        
Low Voltage Warning             3.00V           2.2538 V
Low Voltage Alarm               2.90V           2.6990 V

Contradictions for the ethtool reported voltage warning and alarm thresholds:
1. The high voltage alarm should occur at higher voltage than the operating voltage, but ethtool didn't report that.
2. The high voltage warning should occur at higher voltage than the low voltage warning and alarm, but ethtool didn't report that.
3. The low voltage warning should occur at higher voltage than the low voltage alarm, but ethtool didn't report that.
4. The low voltage alarm should occur at a lower voltage than any of the other voltage warnings and alarms, but ethtool didn't report that.
5. The current voltage value was reported as 3.3249V, which should trigger high voltage warning and alarm, according to the reported thresholds, but no warnings or alarms are indicated.  
 
Each of the 4 voltage thresholds reported by ethtool have contradictions, so we know something is not right.  This same kind of logic can be applied to the thresholds for temperature, laser TX power, etc.. to find that those values are also spurious.


Installing the above transceiver in a Cisco switch reveals that the Cisco correctly retrieves the true warning and alarm threshold values from the transceiver's EEPROM, so we trust that the transceiver has been correctly programmed.  Cisco CLI output for that transceiver shown here:

switch# show interface ethernet 1/3 transceiver details 
Ethernet1/3
    transceiver is present
    type is QSFP-100G-CWDM4-MSA-FEC
    name is TRANSITION
    part number is TNQSFP100GCWDM4
    revision is 1A
    serial number is TN02000302
    nominal bitrate is 25500 MBit/sec per channel
    Link length supported for 9/125um fiber is 2 km
    cisco id is 17
    cisco extended id number is 252

Lane Number:1 Network Lane
           SFP Detail Diagnostics Information (internal calibration)
  ----------------------------------------------------------------------------
                Current              Alarms                  Warnings
                Measurement     High        Low         High          Low
  ----------------------------------------------------------------------------
  Temperature   38.08 C        80.00 C    -10.00 C     75.00 C       -5.00 C
  Voltage        3.34 V         3.70 V      2.90 V      3.59 V        3.00 V
  Current       34.24 mA       75.00 mA    10.00 mA    70.00 mA      15.00 mA
  Tx Power      -0.44 dBm       4.49 dBm   -8.50 dBm    3.49 dBm     -7.52 dBm
  Rx Power          N/A         4.49 dBm  -14.55 dBm    3.49 dBm    -12.51 dBm
  Transmit Fault Count = 0
  ----------------------------------------------------------------------------
  Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning

Lane Number:2 Network Lane
           SFP Detail Diagnostics Information (internal calibration)
  ----------------------------------------------------------------------------
                Current              Alarms                  Warnings
                Measurement     High        Low         High          Low
  ----------------------------------------------------------------------------
  Temperature   38.08 C        80.00 C    -10.00 C     75.00 C       -5.00 C
  Voltage        3.34 V         3.70 V      2.90 V      3.59 V        3.00 V
  Current       34.24 mA       75.00 mA    10.00 mA    70.00 mA      15.00 mA
  Tx Power      -1.20 dBm       4.49 dBm   -8.50 dBm    3.49 dBm     -7.52 dBm
  Rx Power          N/A         4.49 dBm  -14.55 dBm    3.49 dBm    -12.51 dBm
  Transmit Fault Count = 0
  ----------------------------------------------------------------------------
  Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning

Lane Number:3 Network Lane
           SFP Detail Diagnostics Information (internal calibration)
  ----------------------------------------------------------------------------
                Current              Alarms                  Warnings
                Measurement     High        Low         High          Low
  ----------------------------------------------------------------------------
  Temperature   38.08 C        80.00 C    -10.00 C     75.00 C       -5.00 C
  Voltage        3.34 V         3.70 V      2.90 V      3.59 V        3.00 V
  Current       33.21 mA       75.00 mA    10.00 mA    70.00 mA      15.00 mA
  Tx Power      -0.96 dBm       4.49 dBm   -8.50 dBm    3.49 dBm     -7.52 dBm
  Rx Power          N/A         4.49 dBm  -14.55 dBm    3.49 dBm    -12.51 dBm
 Transmit Fault Count = 0
  ----------------------------------------------------------------------------
  Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning

Lane Number:4 Network Lane
           SFP Detail Diagnostics Information (internal calibration)
  ----------------------------------------------------------------------------
                Current              Alarms                  Warnings
                Measurement     High        Low         High          Low
  ----------------------------------------------------------------------------
  Temperature   38.08 C        80.00 C    -10.00 C     75.00 C       -5.00 C
  Voltage        3.34 V         3.70 V      2.90 V      3.59 V        3.00 V
  Current       33.72 mA       75.00 mA    10.00 mA    70.00 mA      15.00 mA
  Tx Power      -1.59 dBm       4.49 dBm   -8.50 dBm    3.49 dBm     -7.52 dBm
  Rx Power          N/A         4.49 dBm  -14.55 dBm    3.49 dBm    -12.51 dBm
  Transmit Fault Count = 0
  ----------------------------------------------------------------------------
  Note: ++  high-alarm; +  high-warning; --  low-alarm; -  low-warning

switch#


Any help with these issues is greatly appreciated.  If you have any questions or advice, please let me know.  I'll be glad to continue troubleshooting this until it's resolved.  Thank you.    


Chris Preimesberger | Test & Validation Engineer
Transition Networks, Inc.

chrisp@...nsition.com
direct: +1.952.996.1509 | fax: +1.952.941.2322 | www.transition.com
________________________________________








View attachment "ethtoolQSFP28thresholdsCiscoComparison.txt" of type "text/plain" (4602 bytes)

View attachment "ethtoolQSFP28thresholdsExpectedOutput.txt" of type "text/plain" (7258 bytes)

View attachment "ethtoolQSFP28thresholdsSpuriousOutput1of2.txt" of type "text/plain" (6843 bytes)

View attachment "ethtoolQSFP28thresholdsSpuriousOutput2of2.txt" of type "text/plain" (6866 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ