lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c3726cb7-6eff-43c6-a7d4-1e931d48151f@ans.pl>
Date: Tue, 21 May 2024 21:54:48 -0700
From: Krzysztof Olędzki <ole@....pl>
To: Andrew Lunn <andrew@...n.ch>
Cc: Michal Kubecek <mkubecek@...e.cz>, Moshe Shemesh <moshe@...dia.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>, idosch@...dia.com
Subject: Re: "netlink error: Invalid argument" with ethtool-5.13+ on recent
 kernels due to "ethtool: Add netlink handler for getmodule (-m)" -
 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via
 netlink?

Hi,

On 21.05.2024 at 13:21, Andrew Lunn wrote:
>> sending genetlink packet (76 bytes):
>>     msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
>>     ETHTOOL_MSG_MODULE_EEPROM_GET
>>         ETHTOOL_A_MODULE_EEPROM_HEADER
>>             ETHTOOL_A_HEADER_DEV_NAME = "eth3"
>>         ETHTOOL_A_MODULE_EEPROM_LENGTH = 128
>>         ETHTOOL_A_MODULE_EEPROM_OFFSET = 128
>>         ETHTOOL_A_MODULE_EEPROM_PAGE = 3
>>         ETHTOOL_A_MODULE_EEPROM_BANK = 0
>>         ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS = 80
>> received genetlink packet (96 bytes):
>>     msg length 96 error errno=-22
> 
> This is a mellanox card right?

Yes, sorry. This is indeed Mellanox (now Nvidia) CX3 / CX3Pro, using the drivers/net/ethernet/mellanox/mlx4 driver.

> mlx4_en_get_module_info() and mlx4_en_get_module_eeprom() implement
> the old API for reading data from an SFP module. So the ethtool core
> will be mapping the new API to the old API. The interesting function
> is probably fallback_set_params():
> 
> https://elixir.bootlin.com/linux/latest/source/net/ethtool/eeprom.c#L29
> 
> and my guess is, you are hitting:
> 
> 	if (offset >= modinfo->eeprom_len)
> 		return -EINVAL;
> 
> offset is 3 * 128 + 128 = 512.
> 
> mlx4_en_get_module_info() is probably returning eeprom_len of 256?
> 
> Could you verify this?

Ah, excellent catch Andrew!

# egrep -R 'ETH_MODULE_SFF_[0-9]+_LEN' include/uapi/linux/ethtool.h
#define ETH_MODULE_SFF_8079_LEN         256
#define ETH_MODULE_SFF_8472_LEN         512
#define ETH_MODULE_SFF_8636_LEN         256
#define ETH_MODULE_SFF_8436_LEN         256

The code in mlx4_en_get_module_info (with length annotation):

        switch (data[0] /* identifier */) {
        case MLX4_MODULE_ID_QSFP:
                modinfo->type = ETH_MODULE_SFF_8436;
                modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;		// 256
                break;
        case MLX4_MODULE_ID_QSFP_PLUS:
                if (data[1] >= 0x3) { /* revision id */
                        modinfo->type = ETH_MODULE_SFF_8636;
                        modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;	// 256
                } else {
                        modinfo->type = ETH_MODULE_SFF_8436;
                        modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;	// 256
                }
                break;
        case MLX4_MODULE_ID_QSFP28:
                modinfo->type = ETH_MODULE_SFF_8636;
                modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;		// 256
                break;
        case MLX4_MODULE_ID_SFP:
                modinfo->type = ETH_MODULE_SFF_8472;
                modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN;		// 512
                break;
        default:
                return -EINVAL;
        }

So right, the function returns 512 for SFP and 256 for everything else, which explains why SFP does work but QSFP - not.

Following your advice, I added some debug printks to net/ethtool/eeprom.c:

@@ -33,16 +33,24 @@
        u32 offset = request->offset;
        u32 length = request->length;

+       printk("A: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
+
        if (request->page)
                offset = request->page * ETH_MODULE_EEPROM_PAGE_LEN + offset;

+       printk("B: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
+
        if (modinfo->type == ETH_MODULE_SFF_8472 &&
            request->i2c_address == 0x51)
                offset += ETH_MODULE_EEPROM_PAGE_LEN * 2;

+       printk("C: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
+
        if (offset >= modinfo->eeprom_len)
                return -EINVAL;

+       printk("D: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
+
        eeprom->cmd = ETHTOOL_GMODULEEEPROM;
        eeprom->len = length;
        eeprom->offset = offset;

Here is the result:

SFP:
A: offset=0, modinfo->eeprom_len=512
B: offset=0, modinfo->eeprom_len=512
C: offset=0, modinfo->eeprom_len=512
D: offset=0, modinfo->eeprom_len=512
A: offset=0, modinfo->eeprom_len=512
B: offset=0, modinfo->eeprom_len=512
C: offset=0, modinfo->eeprom_len=512
D: offset=0, modinfo->eeprom_len=512

QSFP:
A: offset=0, modinfo->eeprom_len=256
B: offset=0, modinfo->eeprom_len=256
C: offset=0, modinfo->eeprom_len=256
D: offset=0, modinfo->eeprom_len=256

A: offset=0, modinfo->eeprom_len=256
B: offset=0, modinfo->eeprom_len=256
C: offset=0, modinfo->eeprom_len=256
D: offset=0, modinfo->eeprom_len=256

A: offset=128, modinfo->eeprom_len=256
B: offset=128, modinfo->eeprom_len=256
C: offset=128, modinfo->eeprom_len=256
D: offset=128, modinfo->eeprom_len=256

A: offset=128, modinfo->eeprom_len=256
B: offset=512, modinfo->eeprom_len=256
C: offset=512, modinfo->eeprom_len=256
Note - no "D" as -EINVAL is returned exactly as you predicted.

BTW: there is another suspicious looking thing in this code:
 - "u32 length = request->length;" is set early in the function
 - length is never updated
 - at the end, we have "eeprom->len = length"

In this case, the existence of length seems at least seems redundant, unless I missed something?

For the reference, the function was added in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=96d971e307cc0e434f96329b42bbd98cfbca07d2
Later https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a3bb7b63813f674fb62bac321cdd897cc62de094 changed ETH_MODULE_SFF_8079 to ETH_MODULE_SFF_8472.

Thanks,
 Krzysztof

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ