lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cbe874db-9ac9-42b8-afa0-88ea910e1e99@intel.com>
Date: Fri, 3 May 2024 11:37:10 -0700
From: Jacob Keller <jacob.e.keller@...el.com>
To: <kernel.org-fo5k2w@...arbi.fr>, <jesse.brandeburg@...el.com>,
	<anthony.l.nguyen@...el.com>
CC: <intel-wired-lan@...ts.osuosl.org>, <netdev@...r.kernel.org>
Subject: Re: Non-functional ixgbe driver between Intel X553 chipset and Cisco
 switch via kernel >=6.1 under Debian



On 2/15/2024 3:02 AM, kernel.org-fo5k2w@...arbi.fr wrote:
> Hello,
> 
> (Please note that I don't speak English, sorry if the traction is not faithful to your language)
> 

Hi,

I haven't touched the ixgbe driver and hardware in many years, but I'll
try to see what I can do to help.

> Following Bjorn Helgaas's advice (https://bugzilla.kernel.org/show_bug.cgi?id=218050#c14), I'm coming to you in the hope of finding a solution to a problem encountered by several users of the ixgbe driver. The subject has been discussed in the messages and comments on the following pages:
> https://marc.info/?l=linux-netdev&m=170118007007901&w=2
> https://forum.proxmox.com/threads/intel-x553-sfp-ixgbe-no-go-on-pve8.135129/
> https://www.servethehome.com/the-everything-fanless-home-server-firewall-router-and-nas-appliance-qotom-qnap-teamgroup/
> https://www.servethehome.com/intel-x553-networking-and-proxmox-ve-8-1-3/?unapproved=518173&moderation-hash=e57a05288058d3ff253ceb42e9ada905
> https://forum.proxmox.com/threads/proxmox-8-kernel-6-2-16-4-pve-ixgbe-driver-fails-to-load-due-to-pci-device-probing-failure.131203/
> https://bugzilla.kernel.org/show_bug.cgi?id=218491
> https://bugzilla.kernel.org/show_bug.cgi?id=218050
> 
> Having myself decided to purchase a Qotom Q20332G9-S10 machine with X553 chipset for testing purposes, I can see the effectiveness of the connection problem between the PC's X553 SFP+ and a Cisco switch SFP+. For my part, this happens under GNU/Linux Debian 12 - kernel 6.1.76 and Sid - kernel 6.6.13. So it's not specific to Proxmox.
> I should point out that under GNU/Linux Debian 11 - kernel 5.10, the network card (X553 via ixgbe) works without problems. So this is a relatively "recent" bug.
> 
> Here's my test environment:
> - 1 Qotom Q20332G9-S10 (I used a 16GB Intel Optane M10 M.2 SSD with a fresh GNU/Linux Debian 12)
> - 1 Cisco DAC cable (tested with a 1M and a 3M)
> - 1 PC with Mellanox Connectx-3 2x SFP+ network card (running GNU/Linux Debian SID installed several years ago)
> - 1 Cisco 3560CX-12PD-S switch (2 SFP+ ports) with IOS 15.2(7)E2
> 
> Connecting the Qotom Q20332G9-S10 (X553) to the Mellanox Connectx-3 works without a hitch and without any special handling (the linux-image-6.1.0-17-amd64 ixgbe driver works in this configuration). Full 10gbps speeds between the two with an "iperf".
> 

So everything works when connected back to back with the Connectx-3. Ok.

> At this stage, I've ruled out a hardware incompatibility (OSI level 1) since the DAC works with the X553. So there's no need to use compatibility tricks as suggested in the link comments with the "allow_unsupported_sfp=1" parameter. This will be useless in the following tests (I've checked).
> 

To confirm, you use the same cable in both cases?

> Where it gets tricky is when you connect it (the Qotom) to the Cisco switch.
> Before an "ip link eno1 up", the Cisco raises the link on its side, but the Debian doesn't (link DOWN). After the "ip link eno1 up", the link drops and never comes back. There does seem to be a driver problem in recent kernels (GNU/Linux Debian Stable and Sid).
> 

But on the switch, the link is reported up until we bring the interface
up in ixgbe, and then link drops and stays down indefinitely?

> After compiling the driver manually (https://downloadmirror.intel.com/812532/ixgbe-5.19.9.tar.gz) following the documentation already shared by others (https://www.xmodulo.com/download-install-ixgbe-driver-ubuntu-debian.html), it works with the Cisco (after a "shut/no shut" of the latter's 10gbe port).
> 
> So we end up with a working machine (I even configured and used the SR-IOV successfully right afterwards).
> 

But if you use the out-of-tree ixgbe driver everything works. Hmm.

> PS: I also tested with Debian Sid
> 
> I've finally tried the commands you were giving Skyler without any result (rmmod ixgbe; modprobe ixgbe; ethtool -S eno1 | grep fault).
> 
> For the moment, the Qotom machine is dedicated to testing, so I'm available to carry out any manipulations you may wish to make to advance the subject.
> Can we work on diagnosing this problem so that the next stable release of Debian is fully functional with this Intel network card?
> 
> Best regards.

I tried checking the out-of-tree versions to see if there were any
obvious fixes. I didn't find anything. The code between the in-kernel
and out-of-tree is so different that it is hard to track down. At first
I wondered if this might be a regression due to recent changes to
support new hardware, but it appears that v6.1 is from before a lot of
that work went in.

It may be helpful if you could provide some more information from the
system in the Cisco switch case:

1. The kernel message logs from when you bring up the interface. You can
get this from dmesg or journalctl -k if you have systemd.

2. "ethtool eno1" after you bring the interface up to see what it
reports about link

3. "ethtool -S eno1" to see if any other stats are reported that might
help us isolate whats going on.


Do you happen to know if any particular in-kernel driver version worked?
It would help limit the search for regressing commits. Ideally, if you
could use git bisect on the setup that could efficiently locate what
regressed the behavior.

Regards,
Jake

> 
> ⢀⣴⠾⠻⢶⣦⠀
> ⣾⠁⢠⠒⠀⣿⡁ Yohan Charbi
> ⢿⡄⠘⠷⠚⠋⠀ Cordialement
> ⠈⠳⣄⠀
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ