[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c3726cb7-6eff-43c6-a7d4-1e931d48151f@ans.pl>
Date: Tue, 21 May 2024 21:54:48 -0700
From: Krzysztof Olędzki <ole@....pl>
To: Andrew Lunn <andrew@...n.ch>
Cc: Michal Kubecek <mkubecek@...e.cz>, Moshe Shemesh <moshe@...dia.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>, idosch@...dia.com
Subject: Re: "netlink error: Invalid argument" with ethtool-5.13+ on recent
kernels due to "ethtool: Add netlink handler for getmodule (-m)" -
25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via
netlink?
Hi,
On 21.05.2024 at 13:21, Andrew Lunn wrote:
>> sending genetlink packet (76 bytes):
>> msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
>> ETHTOOL_MSG_MODULE_EEPROM_GET
>> ETHTOOL_A_MODULE_EEPROM_HEADER
>> ETHTOOL_A_HEADER_DEV_NAME = "eth3"
>> ETHTOOL_A_MODULE_EEPROM_LENGTH = 128
>> ETHTOOL_A_MODULE_EEPROM_OFFSET = 128
>> ETHTOOL_A_MODULE_EEPROM_PAGE = 3
>> ETHTOOL_A_MODULE_EEPROM_BANK = 0
>> ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS = 80
>> received genetlink packet (96 bytes):
>> msg length 96 error errno=-22
>
> This is a mellanox card right?
Yes, sorry. This is indeed Mellanox (now Nvidia) CX3 / CX3Pro, using the drivers/net/ethernet/mellanox/mlx4 driver.
> mlx4_en_get_module_info() and mlx4_en_get_module_eeprom() implement
> the old API for reading data from an SFP module. So the ethtool core
> will be mapping the new API to the old API. The interesting function
> is probably fallback_set_params():
>
> https://elixir.bootlin.com/linux/latest/source/net/ethtool/eeprom.c#L29
>
> and my guess is, you are hitting:
>
> if (offset >= modinfo->eeprom_len)
> return -EINVAL;
>
> offset is 3 * 128 + 128 = 512.
>
> mlx4_en_get_module_info() is probably returning eeprom_len of 256?
>
> Could you verify this?
Ah, excellent catch Andrew!
# egrep -R 'ETH_MODULE_SFF_[0-9]+_LEN' include/uapi/linux/ethtool.h
#define ETH_MODULE_SFF_8079_LEN 256
#define ETH_MODULE_SFF_8472_LEN 512
#define ETH_MODULE_SFF_8636_LEN 256
#define ETH_MODULE_SFF_8436_LEN 256
The code in mlx4_en_get_module_info (with length annotation):
switch (data[0] /* identifier */) {
case MLX4_MODULE_ID_QSFP:
modinfo->type = ETH_MODULE_SFF_8436;
modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN; // 256
break;
case MLX4_MODULE_ID_QSFP_PLUS:
if (data[1] >= 0x3) { /* revision id */
modinfo->type = ETH_MODULE_SFF_8636;
modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN; // 256
} else {
modinfo->type = ETH_MODULE_SFF_8436;
modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN; // 256
}
break;
case MLX4_MODULE_ID_QSFP28:
modinfo->type = ETH_MODULE_SFF_8636;
modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN; // 256
break;
case MLX4_MODULE_ID_SFP:
modinfo->type = ETH_MODULE_SFF_8472;
modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN; // 512
break;
default:
return -EINVAL;
}
So right, the function returns 512 for SFP and 256 for everything else, which explains why SFP does work but QSFP - not.
Following your advice, I added some debug printks to net/ethtool/eeprom.c:
@@ -33,16 +33,24 @@
u32 offset = request->offset;
u32 length = request->length;
+ printk("A: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
+
if (request->page)
offset = request->page * ETH_MODULE_EEPROM_PAGE_LEN + offset;
+ printk("B: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
+
if (modinfo->type == ETH_MODULE_SFF_8472 &&
request->i2c_address == 0x51)
offset += ETH_MODULE_EEPROM_PAGE_LEN * 2;
+ printk("C: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
+
if (offset >= modinfo->eeprom_len)
return -EINVAL;
+ printk("D: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
+
eeprom->cmd = ETHTOOL_GMODULEEEPROM;
eeprom->len = length;
eeprom->offset = offset;
Here is the result:
SFP:
A: offset=0, modinfo->eeprom_len=512
B: offset=0, modinfo->eeprom_len=512
C: offset=0, modinfo->eeprom_len=512
D: offset=0, modinfo->eeprom_len=512
A: offset=0, modinfo->eeprom_len=512
B: offset=0, modinfo->eeprom_len=512
C: offset=0, modinfo->eeprom_len=512
D: offset=0, modinfo->eeprom_len=512
QSFP:
A: offset=0, modinfo->eeprom_len=256
B: offset=0, modinfo->eeprom_len=256
C: offset=0, modinfo->eeprom_len=256
D: offset=0, modinfo->eeprom_len=256
A: offset=0, modinfo->eeprom_len=256
B: offset=0, modinfo->eeprom_len=256
C: offset=0, modinfo->eeprom_len=256
D: offset=0, modinfo->eeprom_len=256
A: offset=128, modinfo->eeprom_len=256
B: offset=128, modinfo->eeprom_len=256
C: offset=128, modinfo->eeprom_len=256
D: offset=128, modinfo->eeprom_len=256
A: offset=128, modinfo->eeprom_len=256
B: offset=512, modinfo->eeprom_len=256
C: offset=512, modinfo->eeprom_len=256
Note - no "D" as -EINVAL is returned exactly as you predicted.
BTW: there is another suspicious looking thing in this code:
- "u32 length = request->length;" is set early in the function
- length is never updated
- at the end, we have "eeprom->len = length"
In this case, the existence of length seems at least seems redundant, unless I missed something?
For the reference, the function was added in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=96d971e307cc0e434f96329b42bbd98cfbca07d2
Later https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a3bb7b63813f674fb62bac321cdd897cc62de094 changed ETH_MODULE_SFF_8079 to ETH_MODULE_SFF_8472.
Thanks,
Krzysztof
Powered by blists - more mailing lists