lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 10 Nov 2019 19:47:25 +0000
From:   Russell King - ARM Linux admin <linux@...linux.org.uk>
To:     Andrew Lunn <andrew@...n.ch>
Cc:     Florian Fainelli <f.fainelli@...il.com>,
        Heiner Kallweit <hkallweit1@...il.com>,
        "David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org
Subject: Re: [PATCH net-next 00/17] Allow slow to initialise GPON modules to
 work

On Sun, Nov 10, 2019 at 06:52:17PM +0100, Andrew Lunn wrote:
> On Sun, Nov 10, 2019 at 02:05:30PM +0000, Russell King - ARM Linux admin wrote:
> > Some GPON modules take longer than the SFF MSA specified time to
> > initialise and respond to transactions on the I2C bus for either
> > both 0x50 and 0x51, or 0x51 bus addresses.  Technically these modules
> > are non-compliant with the SFP Multi-Source Agreement, they have
> > been around for some time, so are difficult to just ignore.
> 
> Hi Russell
> 
> We are seeing quite a few SFP/SFF which violate the spec. Do you think
> there is any value in naming and shaming in the kernel logs SFP which
> don't conform to the standard? If you need to wait longer than 1
> second for the EEPROM to become readable, print the vendor name from
> the EEPROM and warn it is not conforment. If the diagnostic page is
> not immediately available, again, print the vendor name warn it is not
> conforment?

I really don't think it will achieve anything.  Once something is
established in the market, it's difficult to get the vendor to change
it.

In some cases, it's not possible to change it without an entire
hardware redesign, and we're not going to achieve that by "naming and
shaming" when most places this will be used is in embedded devices
where hardly anyone looks at the kernel message log.

It is annoying that there are these modules that do not conform, but
we have many instances in the kernel of hardware that doesn't quite
conform, yet we still make it work.

I have another fun case with another module - a copper SFP+ module that
has a Broadcom Clause 45 NBASE-T PHY on it.  On reset, it quickly
becomes accessible via the I2C bus, but it hasn't finished
initialising.  We're soo quick in the kernel that we read the IDs and
bind the PHY driver for it, and attempt to set the advertisements - but
because the PHY hasn't finished initialising, the kernels
advertisements get overwritten by the PHY (presumably EEPROM loading,
or maybe via PHY firmware initialisation.)

I've stumbled over (very annoyingly) the fact that OpenWRT is carrying
a patch for the SFP code to deal with PHYs that need longer to
initialise - but there has been no upstream report of it afaics:

https://github.com/openwrt/openwrt/blob/master/target/linux/mvebu/patches-4.19/450-reprobe_sfp_phy.patch

That may be due to us not quite checking the TX_FAULT line correctly
(which these patches change) - we had assumed that the PHY would
always be available after 50ms, but the spec actually says that
modules are allowed 300ms to startup (even longer, 90s, for cooled
laser optical modules, which I haven't published the patches for.)
End of the startup is signalled by TX_FAULT being deasserted.  However,
the copper PHYs I've seen so far tie TX_FAULT to ground on the module,
and I have no cooled laser modules to test with.  Maybe all copper PHY
modules are non-compliant...

Also, remember that there is nothing in the EEPROM which tells us what
mode the host serdes should operate in - we work that out by best-
guessing today, and so far we're getting away with it.  There are,
however, copper SFP modules that are 1G only that use 1000BASE-X,
where the only difference from their 1G/100M/10M cousins is a different
part number and use SGMII by default.

What I'm basically saying is that relying on the "specification" is
all well and good, but if we implmented the letter of the spec, we
would only allow 1000BASE-X with SFP and 10GBASE-R with SFP+ cages,
and wouldn't have copper SFPs working.  Technically, the majority of
those on the market are non-compliant.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ