lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 22 May 2024 22:29:43 -0700
From: Krzysztof Olędzki <ole@....pl>
To: Andrew Lunn <andrew@...n.ch>, Ido Schimmel <idosch@...dia.com>
Cc: Michal Kubecek <mkubecek@...e.cz>, Moshe Shemesh <moshe@...dia.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>, tariqt@...dia.com
Subject: Re: "netlink error: Invalid argument" with ethtool-5.13+ on recent
 kernels due to "ethtool: Add netlink handler for getmodule (-m)" -
 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via
 netlink?

On 22.05.2024 at 05:44, Andrew Lunn wrote:
>>> So right, the function returns 512 for SFP and 256 for everything else, which explains why SFP does work but QSFP - not.
>>
>> Since you already did all the work and you are able to test patches, do
>> you want to fix it yourself and submit or report to the mlx4 maintainer
>> (copied)? Fix should be similar to mlx5 commit a708fb7b1f8d ("net/mlx5e:
>> ethtool, Add support for EEPROM high pages query").

Oh, thank you so much for the pointer! Turns out, it was way easier than I thought:

# ethtool -m eth2
        Identifier                                : 0x0d (QSFP+)
        Extended identifier                       : 0x00
        Extended identifier description           : 1.5W max. Power consumption
        Extended identifier description           : No CDR in TX, No CDR in RX
        Extended identifier description           : High Power Class (> 3.5 W) not enabled
        Power set                                 : On
        Power override                            : On
        Connector                                 : 0x0c (MPO Parallel Optic)
        Transceiver codes                         : 0x04 0x00 0x00 0x00 0x00 0x00 0x00 0x00
        Transceiver type                          : 40G Ethernet: 40G Base-SR4
        Encoding                                  : 0x05 (64B/66B)
        BR, Nominal                               : 10300Mbps
        Rate identifier                           : 0x00
        Length (SMF,km)                           : 0km
        Length (OM3 50um)                         : 100m
        Length (OM2 50um)                         : 0m
        Length (OM1 62.5um)                       : 0m
        Length (Copper or Active cable)           : 0m
        Transmitter technology                    : 0x00 (850 nm VCSEL)
        Laser wavelength                          : 850.000nm
        Laser wavelength tolerance                : 10.000nm
        Vendor name                               : AVAGO
        Vendor OUI                                : 00:17:6a
        Vendor PN                                 : <REDACTED>
        Vendor rev                                : 01
        Vendor SN                                 : <REDACTED>
        Date code                                 : <REDACTED>
        Revision Compliance                       : Revision not specified
        Module temperature                        : 42.17 degrees C / 107.90 degrees F
        Module voltage                            : 3.2917 V
        Alarm/warning flags implemented           : Yes
        Laser tx bias current (Channel 1)         : 6.662 mA
        Laser tx bias current (Channel 2)         : 0.000 mA
        Laser tx bias current (Channel 3)         : 0.000 mA
        Laser tx bias current (Channel 4)         : 0.000 mA
        Transmit avg optical power (Channel 1)    : 0.7119 mW / -1.48 dBm
        Transmit avg optical power (Channel 2)    : 0.0000 mW / -inf dBm
        Transmit avg optical power (Channel 3)    : 0.0000 mW / -inf dBm
        Transmit avg optical power (Channel 4)    : 0.0000 mW / -inf dBm
        Rcvr signal avg optical power(Channel 1)  : 0.7682 mW / -1.15 dBm
        Rcvr signal avg optical power(Channel 2)  : 0.0000 mW / -inf dBm
        Rcvr signal avg optical power(Channel 3)  : 0.0000 mW / -inf dBm
        Rcvr signal avg optical power(Channel 4)  : 0.0000 mW / -inf dBm
        Laser bias current high alarm   (Chan 1)  : Off
        Laser bias current low alarm    (Chan 1)  : Off
        Laser bias current high warning (Chan 1)  : Off
        Laser bias current low warning  (Chan 1)  : Off
        Laser bias current high alarm   (Chan 2)  : Off
        Laser bias current low alarm    (Chan 2)  : Off
        Laser bias current high warning (Chan 2)  : Off
        Laser bias current low warning  (Chan 2)  : Off
        Laser bias current high alarm   (Chan 3)  : Off
        Laser bias current low alarm    (Chan 3)  : Off
        Laser bias current high warning (Chan 3)  : Off
        Laser bias current low warning  (Chan 3)  : Off
        Laser bias current high alarm   (Chan 4)  : Off
        Laser bias current low alarm    (Chan 4)  : Off
        Laser bias current high warning (Chan 4)  : Off
        Laser bias current low warning  (Chan 4)  : Off
        Module temperature high alarm             : Off
        Module temperature low alarm              : Off
        Module temperature high warning           : Off
        Module temperature low warning            : Off
        Module voltage high alarm                 : Off
        Module voltage low alarm                  : Off
        Module voltage high warning               : Off
        Module voltage low warning                : Off
        Laser tx power high alarm   (Channel 1)   : Off
        Laser tx power low alarm    (Channel 1)   : Off
        Laser tx power high warning (Channel 1)   : Off
        Laser tx power low warning  (Channel 1)   : Off
        Laser tx power high alarm   (Channel 2)   : Off
        Laser tx power low alarm    (Channel 2)   : Off
        Laser tx power high warning (Channel 2)   : Off
        Laser tx power low warning  (Channel 2)   : Off
        Laser tx power high alarm   (Channel 3)   : Off
        Laser tx power low alarm    (Channel 3)   : Off
        Laser tx power high warning (Channel 3)   : Off
        Laser tx power low warning  (Channel 3)   : Off
        Laser tx power high alarm   (Channel 4)   : Off
        Laser tx power low alarm    (Channel 4)   : Off
        Laser tx power high warning (Channel 4)   : Off
        Laser tx power low warning  (Channel 4)   : Off
        Laser rx power high alarm   (Channel 1)   : Off
        Laser rx power low alarm    (Channel 1)   : Off
        Laser rx power high warning (Channel 1)   : Off
        Laser rx power low warning  (Channel 1)   : Off
        Laser rx power high alarm   (Channel 2)   : Off
        Laser rx power low alarm    (Channel 2)   : Off
        Laser rx power high warning (Channel 2)   : Off
        Laser rx power low warning  (Channel 2)   : Off
        Laser rx power high alarm   (Channel 3)   : Off
        Laser rx power low alarm    (Channel 3)   : Off
        Laser rx power high warning (Channel 3)   : Off
        Laser rx power low warning  (Channel 3)   : Off
        Laser rx power high alarm   (Channel 4)   : Off
        Laser rx power low alarm    (Channel 4)   : Off
        Laser rx power high warning (Channel 4)   : Off
        Laser rx power low warning  (Channel 4)   : Off
        Laser bias current high alarm threshold   : 10.000 mA
        Laser bias current low alarm threshold    : 0.500 mA
        Laser bias current high warning threshold : 9.500 mA
        Laser bias current low warning threshold  : 1.000 mA
        Laser output power high alarm threshold   : 0.0000 mW / -inf dBm
        Laser output power low alarm threshold    : 0.0000 mW / -inf dBm
        Laser output power high warning threshold : 0.0000 mW / -inf dBm
        Laser output power low warning threshold  : 0.0000 mW / -inf dBm
        Module temperature high alarm threshold   : 75.00 degrees C / 167.00 degrees F
        Module temperature low alarm threshold    : -5.00 degrees C / 23.00 degrees F
        Module temperature high warning threshold : 70.00 degrees C / 158.00 degrees F
        Module temperature low warning threshold  : 0.00 degrees C / 32.00 degrees F
        Module voltage high alarm threshold       : 3.6300 V
        Module voltage low alarm threshold        : 2.9700 V
        Module voltage high warning threshold     : 3.4650 V
        Module voltage low warning threshold      : 3.1349 V
        Laser rx power high alarm threshold       : 2.1878 mW / 3.40 dBm
        Laser rx power low alarm threshold        : 0.0446 mW / -13.51 dBm
        Laser rx power high warning threshold     : 1.7378 mW / 2.40 dBm
        Laser rx power low warning threshold      : 0.1122 mW / -9.50 dBm

Looks like all we need to do is:

--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c        2024-04-17 02:19:38.000000000 -0700
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c        2024-05-22 12:46:57.290947887 -0700
@@ -2055,15 +2055,15 @@
        switch (data[0] /* identifier */) {
        case MLX4_MODULE_ID_QSFP:
                modinfo->type = ETH_MODULE_SFF_8436;
-               modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;
+               modinfo->eeprom_len = ETH_MODULE_SFF_8436_MAX_LEN;
                break;
        case MLX4_MODULE_ID_QSFP_PLUS:
                if (data[1] >= 0x3) { /* revision id */
                        modinfo->type = ETH_MODULE_SFF_8636;
-                       modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;
+                       modinfo->eeprom_len = ETH_MODULE_SFF_8636_MAX_LEN;
                } else {
                        modinfo->type = ETH_MODULE_SFF_8436;
-                       modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;
+                       modinfo->eeprom_len = ETH_MODULE_SFF_8436_MAX_LEN;
                }
                break;
        case MLX4_MODULE_ID_QSFP28:

If I'm not mistaken, the rest of the logic is already there, such as:

static void mlx4_qsfp_eeprom_params_set(u8 *i2c_addr, u8 *page_num, u16 *offset)
{
        /* Offsets 0-255 belong to page 0.
         * Offsets 256-639 belong to pages 01, 02, 03.
         * For example, offset 400 is page 02: 1 + (400 - 256) / 128 = 2
         */
        if (*offset < I2C_PAGE_SIZE)
                *page_num = 0;
        else
                *page_num = 1 + (*offset - I2C_PAGE_SIZE) / I2C_HIGH_PAGE_SIZE;
        *i2c_addr = I2C_ADDR_LOW;
        *offset -= *page_num * I2C_HIGH_PAGE_SIZE;
}

So, we don't need to make as many changes as in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a708fb7b1f8dcc7a8ed949839958cd5d812dd939

> Before you do that, please could you work on ethtool. I would say it
> has a bug. It has been provided with 256 bytes of SPF data. It should
> be able to decode that and print it in human readable format. So the
> EINVAL should not be considered fatal to decoding.

Yes, I was also thinking this way. Luckily, similar to the situation with the mlx4 driver, all the logic is there - sff8636_dom_parse() checks if map->page_03h is set and if not, just returns gracefully.

So, all we need to do is modify sff8636_memory_map_init_pages():
@@ -1038,7 +1039,7 @@
        sff8636_request_init(&request, 0x3, SFF8636_PAGE_SIZE);
        ret = nl_get_eeprom_page(ctx, &request);
        if (ret < 0)
-               return ret;
+               return 0;
        map->page_03h = request.data - SFF8636_PAGE_SIZE;

        return 0;

As you described, we get all the data except the DOM:
        Identifier                                : 0x0d (QSFP+)
        Extended identifier                       : 0x00
        Extended identifier description           : 1.5W max. Power consumption
        Extended identifier description           : No CDR in TX, No CDR in RX
        Extended identifier description           : High Power Class (> 3.5 W) not enabled
        Power set                                 : On
        Power override                            : On
        Connector                                 : 0x0c (MPO Parallel Optic)
        Transceiver codes                         : 0x00 0x00 0x30 0x00 0x00 0x00 0x00 0x00
        Transceiver type                          : SAS 6.0G
        Transceiver type                          : SAS 3.0G
        Encoding                                  : 0x01 (8B/10B)
        BR, Nominal                               : 0Mbps
        Rate identifier                           : 0x00
        Length (SMF,km)                           : 0km
        Length (OM3 50um)                         : 0m
        Length (OM2 50um)                         : 0m
        Length (OM1 62.5um)                       : 0m
        Length (Copper or Active cable)           : 0m
        Transmitter technology                    : 0x00 (850 nm VCSEL)
        Laser wavelength                          : 850.000nm
        Laser wavelength tolerance                : 10.000nm
        Vendor name                               : AVAGO
        Vendor OUI                                : 00:17:6a
        Vendor PN                                 : <REDACTED>
        Vendor rev                                : A0
        Vendor SN                                 : <REDACTED>
        Date code                                 : <REDACTED>
        Revision Compliance                       : Revision not specified
        Rx loss of signal                         : None
        Tx loss of signal                         : None
        Tx fault                                  : None
        Module temperature                        : 40.91 degrees C / 105.63 degrees F
        Module voltage                            : 3.2848 V
        Alarm/warning flags implemented           : No
        Laser tx bias current (Channel 1)         : 6.634 mA
        Laser tx bias current (Channel 2)         : 0.000 mA
        Laser tx bias current (Channel 3)         : 0.000 mA
        Laser tx bias current (Channel 4)         : 0.000 mA
        Transmit avg optical power (Channel 1)    : 0.8101 mW / -0.91 dBm
        Transmit avg optical power (Channel 2)    : 0.0000 mW / -inf dBm
        Transmit avg optical power (Channel 3)    : 0.0000 mW / -inf dBm
        Transmit avg optical power (Channel 4)    : 0.0000 mW / -inf dBm
        Rcvr signal avg optical power(Channel 1)  : 0.5692 mW / -2.45 dBm
        Rcvr signal avg optical power(Channel 2)  : 0.0000 mW / -inf dBm
        Rcvr signal avg optical power(Channel 3)  : 0.0000 mW / -inf dBm
        Rcvr signal avg optical power(Channel 4)  : 0.0000 mW / -inf dBm

Do you think it would make sense to print a warning in such situation, or just handle this silently?

Finally, as I was looking at the code in fallback_set_params() I started thinking if the length check is actually correct?

I think instead of:
 if (offset >= modinfo->eeprom_len)
we may want:
 if (offset + length > modinfo->eeprom_len)

I don't know if it is safe to assume we always read a single page and cross page reads are not allowed and even if so, that we should rely on this instead of checking the len explicitly? What do you think?

Once I hear from y'all I will prepare the patches.

Thanks,
 Krzysztof


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ