lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 22 May 2024 11:40:30 +0300
From: Ido Schimmel <idosch@...dia.com>
To: Krzysztof Olędzki <ole@....pl>
Cc: Andrew Lunn <andrew@...n.ch>, Michal Kubecek <mkubecek@...e.cz>,
	Moshe Shemesh <moshe@...dia.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	tariqt@...dia.com
Subject: Re: "netlink error: Invalid argument" with ethtool-5.13+ on recent
 kernels due to "ethtool: Add netlink handler for getmodule (-m)" -
 25b64c66f58d3df0ad7272dda91c3ab06fe7a303, also no SFP-DOM support via
 netlink?

On Tue, May 21, 2024 at 09:54:48PM -0700, Krzysztof Olędzki wrote:
> Hi,
> 
> On 21.05.2024 at 13:21, Andrew Lunn wrote:
> >> sending genetlink packet (76 bytes):
> >>     msg length 76 ethool ETHTOOL_MSG_MODULE_EEPROM_GET
> >>     ETHTOOL_MSG_MODULE_EEPROM_GET
> >>         ETHTOOL_A_MODULE_EEPROM_HEADER
> >>             ETHTOOL_A_HEADER_DEV_NAME = "eth3"
> >>         ETHTOOL_A_MODULE_EEPROM_LENGTH = 128
> >>         ETHTOOL_A_MODULE_EEPROM_OFFSET = 128
> >>         ETHTOOL_A_MODULE_EEPROM_PAGE = 3
> >>         ETHTOOL_A_MODULE_EEPROM_BANK = 0
> >>         ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS = 80
> >> received genetlink packet (96 bytes):
> >>     msg length 96 error errno=-22
> > 
> > This is a mellanox card right?
> 
> Yes, sorry. This is indeed Mellanox (now Nvidia) CX3 / CX3Pro, using the drivers/net/ethernet/mellanox/mlx4 driver.
> 
> > mlx4_en_get_module_info() and mlx4_en_get_module_eeprom() implement
> > the old API for reading data from an SFP module. So the ethtool core
> > will be mapping the new API to the old API. The interesting function
> > is probably fallback_set_params():
> > 
> > https://elixir.bootlin.com/linux/latest/source/net/ethtool/eeprom.c#L29
> > 
> > and my guess is, you are hitting:
> > 
> > 	if (offset >= modinfo->eeprom_len)
> > 		return -EINVAL;
> > 
> > offset is 3 * 128 + 128 = 512.
> > 
> > mlx4_en_get_module_info() is probably returning eeprom_len of 256?
> > 
> > Could you verify this?
> 
> Ah, excellent catch Andrew!

Yes, I believe Andrew's analysis is correct.

> 
> # egrep -R 'ETH_MODULE_SFF_[0-9]+_LEN' include/uapi/linux/ethtool.h
> #define ETH_MODULE_SFF_8079_LEN         256
> #define ETH_MODULE_SFF_8472_LEN         512
> #define ETH_MODULE_SFF_8636_LEN         256
> #define ETH_MODULE_SFF_8436_LEN         256
> 
> The code in mlx4_en_get_module_info (with length annotation):
> 
>         switch (data[0] /* identifier */) {
>         case MLX4_MODULE_ID_QSFP:
>                 modinfo->type = ETH_MODULE_SFF_8436;
>                 modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;		// 256
>                 break;
>         case MLX4_MODULE_ID_QSFP_PLUS:
>                 if (data[1] >= 0x3) { /* revision id */
>                         modinfo->type = ETH_MODULE_SFF_8636;
>                         modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;	// 256
>                 } else {
>                         modinfo->type = ETH_MODULE_SFF_8436;
>                         modinfo->eeprom_len = ETH_MODULE_SFF_8436_LEN;	// 256
>                 }
>                 break;
>         case MLX4_MODULE_ID_QSFP28:
>                 modinfo->type = ETH_MODULE_SFF_8636;
>                 modinfo->eeprom_len = ETH_MODULE_SFF_8636_LEN;		// 256
>                 break;
>         case MLX4_MODULE_ID_SFP:
>                 modinfo->type = ETH_MODULE_SFF_8472;
>                 modinfo->eeprom_len = ETH_MODULE_SFF_8472_LEN;		// 512
>                 break;
>         default:
>                 return -EINVAL;
>         }
> 
> So right, the function returns 512 for SFP and 256 for everything else, which explains why SFP does work but QSFP - not.

Since you already did all the work and you are able to test patches, do
you want to fix it yourself and submit or report to the mlx4 maintainer
(copied)? Fix should be similar to mlx5 commit a708fb7b1f8d ("net/mlx5e:
ethtool, Add support for EEPROM high pages query").

> 
> Following your advice, I added some debug printks to net/ethtool/eeprom.c:
> 
> @@ -33,16 +33,24 @@
>         u32 offset = request->offset;
>         u32 length = request->length;
> 
> +       printk("A: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
> +
>         if (request->page)
>                 offset = request->page * ETH_MODULE_EEPROM_PAGE_LEN + offset;
> 
> +       printk("B: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
> +
>         if (modinfo->type == ETH_MODULE_SFF_8472 &&
>             request->i2c_address == 0x51)
>                 offset += ETH_MODULE_EEPROM_PAGE_LEN * 2;
> 
> +       printk("C: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
> +
>         if (offset >= modinfo->eeprom_len)
>                 return -EINVAL;
> 
> +       printk("D: offset=%u, modinfo->eeprom_len=%u\n", offset, modinfo->eeprom_len);
> +
>         eeprom->cmd = ETHTOOL_GMODULEEEPROM;
>         eeprom->len = length;
>         eeprom->offset = offset;
> 
> Here is the result:
> 
> SFP:
> A: offset=0, modinfo->eeprom_len=512
> B: offset=0, modinfo->eeprom_len=512
> C: offset=0, modinfo->eeprom_len=512
> D: offset=0, modinfo->eeprom_len=512
> A: offset=0, modinfo->eeprom_len=512
> B: offset=0, modinfo->eeprom_len=512
> C: offset=0, modinfo->eeprom_len=512
> D: offset=0, modinfo->eeprom_len=512
> 
> QSFP:
> A: offset=0, modinfo->eeprom_len=256
> B: offset=0, modinfo->eeprom_len=256
> C: offset=0, modinfo->eeprom_len=256
> D: offset=0, modinfo->eeprom_len=256
> 
> A: offset=0, modinfo->eeprom_len=256
> B: offset=0, modinfo->eeprom_len=256
> C: offset=0, modinfo->eeprom_len=256
> D: offset=0, modinfo->eeprom_len=256
> 
> A: offset=128, modinfo->eeprom_len=256
> B: offset=128, modinfo->eeprom_len=256
> C: offset=128, modinfo->eeprom_len=256
> D: offset=128, modinfo->eeprom_len=256
> 
> A: offset=128, modinfo->eeprom_len=256
> B: offset=512, modinfo->eeprom_len=256
> C: offset=512, modinfo->eeprom_len=256
> Note - no "D" as -EINVAL is returned exactly as you predicted.
> 
> BTW: there is another suspicious looking thing in this code:
>  - "u32 length = request->length;" is set early in the function
>  - length is never updated
>  - at the end, we have "eeprom->len = length"
> 
> In this case, the existence of length seems at least seems redundant, unless I missed something?

Looks like it

> 
> For the reference, the function was added in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=96d971e307cc0e434f96329b42bbd98cfbca07d2
> Later https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a3bb7b63813f674fb62bac321cdd897cc62de094 changed ETH_MODULE_SFF_8079 to ETH_MODULE_SFF_8472.
> 
> Thanks,
>  Krzysztof

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ