lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200109215903.GV25745@shell.armlinux.org.uk>
Date:   Thu, 9 Jan 2020 21:59:03 +0000
From:   Russell King - ARM Linux admin <linux@...linux.org.uk>
To:     ѽ҉ᶬḳ℠ <vtol@....net>
Cc:     Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org
Subject: Re: [drivers/net/phy/sfp] intermittent failure in state machine
 checks

On Thu, Jan 09, 2020 at 07:42:27PM +0000, ѽ҉ᶬḳ℠ wrote:
> On 09/01/2020 19:01, ѽ҉ᶬḳ℠ wrote:
> > On 09/01/2020 17:43, Russell King - ARM Linux admin wrote:
> > > On Thu, Jan 09, 2020 at 05:35:23PM +0000, ѽ҉ᶬḳ℠ wrote:
> > > > Thank you for the extensive feedback and explanation.
> > > > 
> > > > Pardon for having mixed up the semantics on module
> > > > specifications vs. EEPROM
> > > > dump...
> > > > 
> > > > The module (chipset) been designed by Metanoia, not sure who is
> > > > the actual
> > > > manufacturer, and probably just been branded Allnet.
> > > > The designer provides some proprietary management software
> > > > (called EBM) to
> > > > their wholesale buyers only
> > > I have one of their early MT-V5311 modules, but it has no accessible
> > > EEPROM, and even if it did, it would be of no use to me being
> > > unapproved for connection to the BT Openreach network.  (BT SIN 498
> > > specifies non-standard power profile to avoid crosstalk issues with
> > > existing ADSL infrastructure, and I believe they regularly check the
> > > connected modem type and firmware versions against an approved list.)
> > > 
> > > I haven't noticed the module I have asserting its TX_FAULT signal,
> > > but then its RJ45 has never been connected to anything.
> > > 
> > 
> > The curious (and sort of inexplicable) thing is that the module in
> > general works, i.e. at some point it must pass the sm checks or
> > connectivity would be failing constantly and thus the module being
> > generally unusable.
> > 
> > The reported issues however are intermittent, usually reliably
> > reproducible with
> > 
> > ifdown <iface> && ifup <iface>
> > 
> > or rebooting the router that hosts the module.
> > 
> > If some times passes, not sure but seems in excess of 3 minutes, between
> > ifdown and ifup the sm checks mostly are not failing.
> > It somehow "feels" that the module is storing some link signal
> > information in a register which does not suit the sm check routine and
> > only when that register clears the sm check routine passes and
> > connectivity is restored.
> > ____
> > 
> > Since there are probably other such SFP modules, xDSL and g.fast, out
> > there that do not provide laser safety circuitry by design (since not
> > providing connectivity over fibre) would it perhaps not make sense to
> > try checking for the existence of laser safety circuitry first prior
> > getting to the sm checks?
> > ____
> > 
> 
> I am wondering whether this mentioned in
> https://gitlab.labs.nic.cz/turris/turris-build/issues/89 is the cause of the
> issue perhaps:
> 
> Even when/after the SFP module is recognized and the link mode it set for
> the NIC to the proper value there can still be the link-up signal mismatch
> that we have seen on many non-ethernet SFPs. The thing is that one of the
> SFP pins is called LOS (loss of signal) and when the pin is in active state
> it is being interpreted by the Linux kernel as "link is down", turn off the
> NIC. Unfortunatelly we have seen chicken-and-egg problem with some GPON and
> DSL SFPs - the SFP does not come up and deassert LOS unless there is SGMII
> link from NIC and NIC is not coming up unless LOS is deasserted.

Also, note that the Metanoia MT-V5311 (at least mine) uses 1000BASE-X
not SGMII. It sends a 16-bit configuration word of 0x61a0, which is:

		1000BASE-X			SGMII
Bit 15	0	No next page			Link down
	1	Ack				Ack
	1	Remote fault 2			Reserved (0)
	0	Remote fault 1			Duplex (0 = Half)

	0	Reserved (0)			Speed bit 1
	0	Reserved (0)			Speed bit 0 (00=10Mbps)
	0	Reserved (0)			Reserved (0)
	1	Asymetric pause direction	Reserved (0)

	1	Pause				Reserved (0)
	0	Half duplex not supported	Reserved (0)
	1	Full duplex supported		Reserved (0)
	0	Reserved (0)			Reserved (0)

	0	Reserved (0)			Reserved (0)
	0	Reserved (0)			Reserved (0)
	0	Reserved (0)			Reserved (0)
Bit 0	0	Reserved (0)			Must be 1

So it clearly fits 802.3 Clause 37 1000BASE-X format, reporting 1G
Full duplex, and not SGMII (10M Half duplex).

I have a platform here that allows me to get at the raw config_reg
word that the other end has sent which allows analysis as per the
above.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ