lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c6519af5-8252-4fdb-86c2-c77cf99c292c@intel.com>
Date: Tue, 21 May 2024 14:05:28 -0700
From: Jacob Keller <jacob.e.keller@...el.com>
To: <kernel.org-fo5k2w@...arbi.fr>, Jeff Daly <jeffd@...icom-usa.com>, "Simon
 Horman" <horms@...nel.org>
CC: <netdev@...r.kernel.org>
Subject: Re: [PATCH net] Revert "ixgbe: Manual AN-37 for troublesome link
 partners for X550 SFI"



On 5/21/2024 10:12 AM, kernel.org-fo5k2w@...arbi.fr wrote:
> If any of you have the skills to develop a patch that tries to satisfy everyone, please know that I'm always available for testing on my hardware. If Jeff also has the possibilities, it's not impossible that we could come to a consensus. All we'd have to do would be to test the behavior of our equipment in the problematic situation.
> 

I would love a solution which fixes both cases. I don't currently have
any idea what it would be.

> Isn't there someone at Intel who can contribute their expertise on the underlying technical reasons for the problem (obviously level 1 OSI) in order to guide us towards a state-of-the-art solution?
> 
> Best regards.
> 

Unfortunately I do not know anyone still here who has the expertise to
solve this. The out-of-tree ixgbe driver does not have the fix from
Silicom, so from Intel's perspective the correct implementation matches
the need for the Cisco switch...

> On 5/21/2024 9:49 AM, Jeff Daly wrote:
>> 
>> 
>>> -----Original Message-----
>>> From: Simon Horman <horms@...nel.org>
>>> Sent: Tuesday, May 21, 2024 12:42 PM
>>> To: Jacob Keller <jacob.e.keller@...el.com>
>>> Cc: netdev@...r.kernel.org; Jeff Daly <jeffd@...icom-usa.com>; kernel.org-
>>> fo5k2w@...arbi.fr
>>> Subject: Re: [PATCH net] Revert "ixgbe: Manual AN-37 for troublesome link
>>> partners for X550 SFI"
>>>
>>> One of those awkward situations where the only known (in this case to Jacob
>>> and me) resolution to a regression is itself a regression (for a different setup).
>>>
>>> I think that in these kind of situations it's best to go back to how things were.
>>>
>>> Reviewed-by: Simon Horman <horms@...nel.org>
>> 
>> In principle, I don't disagree.....  However, our customer was very sensitive to having any patches/workarounds needed for their configuration be part of the upstream.  Aside from maintaining our own patchset (or figuring out whether there's a patch that works for everyone) is there a better solution?
>> 
>> 

We're somewhat stuck between a rock and a hard place here. I don't have
full context for the problem, however I did manage to get a little more
info about this from internal bugs.

Here is the facts as i understand it:

1. The Juniper MX5 switch appears to require clause 37 auto negotiation
(AN-37) to link.
2. The Cisco 3560CX-12PD-S appears to reject AN-37 as invalid and stops
trying to link if it sees it for this case.
3. As far as I understand, AN-37 is intended for 1G links, and is not
generally supported or used in 10GB? It looks like the way this fix
applies affects all 10GB SFP links, which results in the issues with the
Cisco switch.


For context, this document was the best I found from a quick google
search: https://www.ieee802.org/3/by/public/Mar15/booth_3by_01_0315.pdf

It appears the Cisco device is linking at some form of 10GB according to
the bug report here:

> show interface status | include Te1/0/1
> Te1/0/1   --- Vers Qotom --- connected    trunk        full    10G 
> SFP-10GBase-CX1


The link is an SFP-10GBase-CX1?

@Jeff, can you provide any further details about the Juniper MX5 switch
case that the original change fixes?

The function being modified here is the ixgbe_setup_sfi_x550a, which is
called for setting up SFI, and the description says "Used to connect the
internal PHY directly to an SFP cage without auto-negotiation"

It is only called by ixgbe_setup_mac_link_sfp_n which is supposed to
configure the PHY for native SFP support for IXGBE_DEV_ID_X550EM_A_SFP_N
(0x15C4). No other device type is changed with this.

Every comment here implies that this has no auto negotiation, which
makes it extremely weird to me that we try to enable AN-37 in this flow.

Without more context, my gut instinct is that the Cisco switch is likely
following the general expectations here compared to the Juniper switch.

I also don't see a good way currently to have the driver select between
the options, if both cases are standard SFP. It can't know what its
linked against. If we try the AN-37 flow with Cisco, it essentially
bricks the link until a reboot. Even reloading to the out-of-tree driver
which doesn't do this AN-37 flow fails to recover link. This makes any
sort of "fallback" mechanism unlikely to work.

Unless we can find some obvious way to distinguish the two cases, or
there is fundamentally a different fix for the Juniper case, I don't see
how we can support both flows.

I guess there is the option of some sort of toggle via ethtool/otherwise
to allow selection... But users might try to enable this when link is
faulty and end up hitting the case where once we try the AN-37, the
remote switch refuses to try again until a cycle.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ