[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zk2vfmI7qnBMxABo@shredder>
Date: Wed, 22 May 2024 11:40:30 +0300
From: Ido Schimmel <idosch@...dia.com>
To: Krzysztof Olędzki <ole@....pl>
Cc: Andrew Lunn <andrew@...n.ch>, Michal Kubecek <mkubecek@...e.cz>,
Moshe Shemesh <moshe@...dia.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
tariqt@...dia.com
Subject: Re: "netlink error: Invalid argument" with ethtool-5.13+ on recent
kernels due to "ethtool: Add netlink handler for getmodule (-m)" -
25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via
netlink?
On Tue, May 21, 2024 at 09:54:48PM -0700, Krzysztof Olędzki wrote:
> Hi,
>
> On 21.05.2024 at 13:21, Andrew Lunn wrote:
> >> sending genetlink packet (76 bytes):
> >> msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
> >> ETHTOOL_MSG_MODULE_EEPROM_GET
> >> ETHTOOL_A_MODULE_EEPROM_HEADER
> >> ETHTOOL_A_HEADER_DEV_NAME = "eth3"
> >> ETHTOOL_A_MODULE_EEPROM_LENGTH = 128
> >> ETHTOOL_A_MODULE_EEPROM_OFFSET = 128
> >> ETHTOOL_A_MODULE_EEPROM_PAGE = 3
> >> ETHTOOL_A_MODULE_EEPROM_BANK = 0
> >> ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS = 80
> >> received genetlink packet (96 bytes):
> >> msg length 96 error errno=-22
> >
> > This is a mellanox card right?
>
> Yes, sorry. This is indeed Mellanox (now Nvidia) CX3 / CX3Pro, using the drivers/net/ethernet/mellanox/mlx4 driver.
>
> > mlx4_en_get_module_info() and mlx4_en_get_module_eeprom() implement
> > the old API for reading data from an SFP module. So the ethtool core
> > will be mapping the new API to the old API. The interesting function
> > is probably fallback_set_params():
> >
> > https://elixir.bootlin.com/linux/latest/source/net/ethtool/eeprom.c#L29
> >
> > and my guess is, you are hitting:
> >
> > if (offset >= modinfo->eeprom_len)
> > return -EINVAL;
> >
> > offset is 3 * 128 + 128 = 512.
> >
> > mlx4_en_get_module_info() is probably returning eeprom_len of 256?
> >
> > Could you verify this?
>
> Ah, excellent catch Andrew!
Yes, I believe Andrew's analysis is correct.
>
> # egrep -R 'ETH_MODULE_SFF_[0-9]+_LEN' include/uapi/linux/ethtool.h
> #define ETH_MODULE_SFF_8079_LEN 256
> #define ETH_MODULE_SFF_8472_LEN 512
> #define ETH_MODULE_SFF_8636_LEN 256
> #define ETH_MODULE_SFF_8436_LEN 256
>
> The code in mlx4_en_get_module_info (with length annotation):
>
> switch (data[0] /* identifier */) {
> case MLX4_MODULE_ID_QSFP:
> modinfo->type = ETH_MODULE_SFF_8436;
> modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN; // 256
> break;
> case MLX4_MODULE_ID_QSFP_PLUS:
> if (data[1] >= 0x3) { /* revision id */
> modinfo->type = ETH_MODULE_SFF_8636;
> modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN; // 256
> } else {
> modinfo->type = ETH_MODULE_SFF_8436;
> modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN; // 256
> }
> break;
> case MLX4_MODULE_ID_QSFP28:
> modinfo->type = ETH_MODULE_SFF_8636;
> modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN; // 256
> break;
> case MLX4_MODULE_ID_SFP:
> modinfo->type = ETH_MODULE_SFF_8472;
> modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN; // 512
> break;
> default:
> return -EINVAL;
> }
>
> So right, the function returns 512 for SFP and 256 for everything else, which explains why SFP does work but QSFP - not.
Since you already did all the work and you are able to test patches, do
you want to fix it yourself and submit or report to the mlx4 maintainer
(copied)? Fix should be similar to mlx5 commit a708fb7b1f8d ("net/mlx5e:
ethtool, Add support for EEPROM high pages query").
>
> Following your advice, I added some debug printks to net/ethtool/eeprom.c:
>
> @@ -33,16 +33,24 @@
> u32 offset = request->offset;
> u32 length = request->length;
>
> + printk("A: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
> +
> if (request->page)
> offset = request->page * ETH_MODULE_EEPROM_PAGE_LEN + offset;
>
> + printk("B: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
> +
> if (modinfo->type == ETH_MODULE_SFF_8472 &&
> request->i2c_address == 0x51)
> offset += ETH_MODULE_EEPROM_PAGE_LEN * 2;
>
> + printk("C: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
> +
> if (offset >= modinfo->eeprom_len)
> return -EINVAL;
>
> + printk("D: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
> +
> eeprom->cmd = ETHTOOL_GMODULEEEPROM;
> eeprom->len = length;
> eeprom->offset = offset;
>
> Here is the result:
>
> SFP:
> A: offset=0, modinfo->eeprom_len=512
> B: offset=0, modinfo->eeprom_len=512
> C: offset=0, modinfo->eeprom_len=512
> D: offset=0, modinfo->eeprom_len=512
> A: offset=0, modinfo->eeprom_len=512
> B: offset=0, modinfo->eeprom_len=512
> C: offset=0, modinfo->eeprom_len=512
> D: offset=0, modinfo->eeprom_len=512
>
> QSFP:
> A: offset=0, modinfo->eeprom_len=256
> B: offset=0, modinfo->eeprom_len=256
> C: offset=0, modinfo->eeprom_len=256
> D: offset=0, modinfo->eeprom_len=256
>
> A: offset=0, modinfo->eeprom_len=256
> B: offset=0, modinfo->eeprom_len=256
> C: offset=0, modinfo->eeprom_len=256
> D: offset=0, modinfo->eeprom_len=256
>
> A: offset=128, modinfo->eeprom_len=256
> B: offset=128, modinfo->eeprom_len=256
> C: offset=128, modinfo->eeprom_len=256
> D: offset=128, modinfo->eeprom_len=256
>
> A: offset=128, modinfo->eeprom_len=256
> B: offset=512, modinfo->eeprom_len=256
> C: offset=512, modinfo->eeprom_len=256
> Note - no "D" as -EINVAL is returned exactly as you predicted.
>
> BTW: there is another suspicious looking thing in this code:
> - "u32 length = request->length;" is set early in the function
> - length is never updated
> - at the end, we have "eeprom->len = length"
>
> In this case, the existence of length seems at least seems redundant, unless I missed something?
Looks like it
>
> For the reference, the function was added in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=96d971e307cc0e434f96329b42bbd98cfbca07d2
> Later https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a3bb7b63813f674fb62bac321cdd897cc62de094 changed ETH_MODULE_SFF_8079 to ETH_MODULE_SFF_8472.
>
> Thanks,
> Krzysztof
Powered by blists - more mailing lists