lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9eaa16ac88aafb6dee36c5781ae4de7881bb03a2@ycharbi.fr>
Date: Thu, 09 May 2024 13:26:37 +0000
From: kernel.org-fo5k2w@...arbi.fr
To: "Jacob Keller" <jacob.e.keller@...el.com>, kernel.org-fo5k2w@...arbi.fr,
 "Jeff Daly" <jeffd@...icom-usa.com>
Cc: anthony.l.nguyen@...el.com, intel-wired-lan@...ts.osuosl.org,
 jesse.brandeburg@...el.com, netdev@...r.kernel.org, regressions@...mhuis.info
Subject: Re: Non-functional ixgbe driver between Intel X553 chipset and Cisco
 switch via kernel >=6.1 under Debian

Hi,

> No link detected, but it does detect this is a 10GBaseT cable.
> Interesting it doesn't report FEC or autonegotiation. Hmm.

In fact, I personally find it strange that the "Supported link modes" is "10000baseT/Full". A DAC is not a SFP+ 8P8C (RJ45) module. Wouldn't it be more logical if the modes reported were the same as those obtained by an "ethtool eth2" on the Connectx-3 side? :

Settings for eth2:
	Supported ports: [ FIBRE ]
	Supported link modes:   10000baseKX4/Full
	                        1000baseX/Full
	                        10000baseCR/Full
	                        10000baseSR/Full
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: No
	Supported FEC modes: Not reported
	Advertised link modes:  10000baseKX4/Full
	                        1000baseX/Full
	                        10000baseCR/Full
	                        10000baseSR/Full
	Advertised pause frame use: Symmetric
	Advertised auto-negotiation: No
	Advertised FEC modes: Not reported
	Speed: 10000Mb/s
	Duplex: Full
	Auto-negotiation: off
	Port: Direct Attach Copper
	PHYAD: 0
	Transceiver: internal
	Supports Wake-on: d
	Wake-on: d
        Current message level: 0x00000014 (20)
                               link ifdown
	Link detected: yes


In other words, isn't the fact that the reported mode is "10000baseT/Full" a bug in itself?
 
> Knowing the kernel is the important part, we don't have specific
> versioning of drivers in the kernel anymore.

Ok. I take note of this information.

> The steps would require that you build the kernel manually. I can
> outline the steps i would take here
> 
> 1. get the kernel source from git.kernel.org. I place it in $HOME/git/linux
> 2. switch to v5.10 with 'git switch --detach v5.10'
> 2. copy the debian 5.10 config file to $HOME/git/linux/.config
> 3. build kernel with 'make -j24' (adjust -j depending on how much CPU
> you want to spend building the kernel)
> 4. install with 'sudo make -j24 modules_install && sudo make install'
> 5. reboot and select the v5.10 kernel, double check it works.
> 6. in $HOME/git/linux run 'git bisect start' to initiate the bisect session.
> 7. First, label the current v5.10 commit as good with 'git bisect good'
> 8. Second, label the v6.1 commit as bad with 'git bisect bad v6.1'
> 
> This will initiate a bisect session and will checkout the kernel
> approximately halfway between v5.10 and v6.1. For each bisection point
> it checks, run the following steps:
> 
> 1. 'make olddefconfig' to update the configuration for this version
> 2. 'make -j24' to rebuild with the current version
> 3. 'sudo make -j24 modules_install && sudo make install' to install this
> version.
> 4. reboot into that version and check its behavior.
> 5. If it works properly then run 'git bisect good'
> 6. If it works incorrectly, then run 'git bisect bad'
> 
> A new commit will be selected. It will pick one between the latest good
> point and the closest bad point, essentially honing in towards the
> incorrect behavior.
> 
> If for any reason a commit can't be built or tested, you can use "git
> bisect skip" and it will skip around a bit to find another point that
> can be tried.

Thank you for your and Thorsten Leemhuis's advice. I don't know whether the following Bisect log will be of any help to you. However, I have determined precisely that the problem was introduced with version 6.1. If I boot into 6.0, it works perfectly. So there are fewer differences to search for the problem. Here's the feedback from Bisect, but I'm still dubious about the relevance of this log because the “git bisect bad v6.1” command returned "7614896350aa20764c5eca527262d9eb0a57da63 était à la fois good et bad"... I didn't really understand how it all worked... :

git bisect start
# good: [4fe89d07dcc2804c8b562f6c7896a45643d34b2f] Linux 6.0
git bisect good 45eb8ae5370d5df1ee8236f45df3f29103ba6e12
# bad: [830b3c68c1fb1e9176028d02ef86f3cf76aa2476] Linux 6.1
git bisect bad 7614896350aa20764c5eca527262d9eb0a57da63

I should point out that I had to switch back to Debian 11 because 12 and 13 refuse to compile these old kernels... Anyway, I compiled the versions successively and came across the difference in operation between 6.0 and 6.1.

> I suspect those changes must have broken the Cisco switch link behavior.
> I unfortunately do not know enough about this hardware or the SFI
> configuration to understand why this causes it.
> 
> If you don't want to try bisect, I would suggest trying to revert that
> commit or simply replace the ixgbe_setup_sfi_x550a function with the one
> from out-of-tree here. If you do that, you can rebuild just ixgbe with
> "make M=drivers/net/ethernet/intel/ixgbe" and then insert the module
> with "insmod drivers/net/ethernet/intel/ixgbe/ixgbe.ko".
> 
> It seems likely that this change had unintended side effect which broke
> the Cisco switch linking.


If I do a "git revert 565736048bd5f9888990569993c6b6bfdf6dcb6d" to go back before the state of the suspected problem commit, compile kernel 6.1 and boot on it, it works perfectly.
So it turns out that this is the source of the malfunction and was introduced with Linux 6.1.

 
> I've added Jeff Daly, in the hopes that he could provide more details on
> the change.
> 
> @Jeff, it seems likely that the change you made at 565736048bd5 ("ixgbe:
> Manual AN-37 for troublesome link partners for X550 SFI") is breaking
> some other switches. It would help if you could shed some light on this
> change as otherwise we might need to revert it and once again break the
> setup you fixed.
> 
> Thanks,
> Jake

Let me know if you need more information. I'll be happy to help!

Best regards.

⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Yohan Charbi
⢿⡄⠘⠷⠚⠋⠀ Cordialement
⠈⠳⣄⠀

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ