[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200109213443.GS25745@shell.armlinux.org.uk>
Date: Thu, 9 Jan 2020 21:34:43 +0000
From: Russell King - ARM Linux admin <linux@...linux.org.uk>
To: ѽ҉ᶬḳ℠ <vtol@....net>
Cc: Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org
Subject: Re: [drivers/net/phy/sfp] intermittent failure in state machine
checks
On Thu, Jan 09, 2020 at 07:01:10PM +0000, ѽ҉ᶬḳ℠ wrote:
> On 09/01/2020 17:43, Russell King - ARM Linux admin wrote:
> > On Thu, Jan 09, 2020 at 05:35:23PM +0000, ѽ҉ᶬḳ℠ wrote:
> > > Thank you for the extensive feedback and explanation.
> > >
> > > Pardon for having mixed up the semantics on module specifications vs. EEPROM
> > > dump...
> > >
> > > The module (chipset) been designed by Metanoia, not sure who is the actual
> > > manufacturer, and probably just been branded Allnet.
> > > The designer provides some proprietary management software (called EBM) to
> > > their wholesale buyers only
> > I have one of their early MT-V5311 modules, but it has no accessible
> > EEPROM, and even if it did, it would be of no use to me being
> > unapproved for connection to the BT Openreach network. (BT SIN 498
> > specifies non-standard power profile to avoid crosstalk issues with
> > existing ADSL infrastructure, and I believe they regularly check the
> > connected modem type and firmware versions against an approved list.)
> >
> > I haven't noticed the module I have asserting its TX_FAULT signal,
> > but then its RJ45 has never been connected to anything.
> >
>
> The curious (and sort of inexplicable) thing is that the module in general
> works, i.e. at some point it must pass the sm checks or connectivity would
> be failing constantly and thus the module being generally unusable.
It all depends what the module does with the TX_FAULT signal. The state
machine just follows what is layed down in the SFP MSA for dealing with
a transmit fault, although the attempts to clear it and the delay from
TX_FAULT being asserted to attempting to clear are decisions of my own.
It isn't a race in the state machine.
You can check the state of the GPIOs by looking at
/sys/kernel/debug/gpio, and you will probably see that TX_FAULT is
being asserted by the module.
I'm aware of something similar with a certain GPON module, but we
haven't been able to properly work out what is going on there either -
again, it seems pretty random what the module does with the TX_FAULT
signal.
> It somehow "feels" that the module is storing some link signal information
> in a register which does not suit the sm check routine and only when that
> register clears the sm check routine passes and connectivity is restored.
You're reading /way/ too much into the state machine. The state
machine is only concerned with two signals from the module. One
of them is the RX_LOS signal which indicates whether the module is
receiving valid signal. The other is TX_FAULT which is as I've
already described. Both of these are digital signals - either they
are asserted or deasserted, and the state machine will act
accordingly. It's rather simple.
> Since there are probably other such SFP modules, xDSL and g.fast, out there
> that do not provide laser safety circuitry by design (since not providing
> connectivity over fibre) would it perhaps not make sense to try checking for
> the existence of laser safety circuitry first prior getting to the sm
> checks?
There is no reliable way to do that; as I've already said, the EEPROM
contents is very hit and miss. Essentially, SFPs suck, almost nothing
can be really trusted with them.
This, I believe, is why commerical grade routers have this apparent
"vendor lockin" because no one can trust anyone elses EEPROM contents
to actually come close to the SFP MSA requirements - and then you have
modules that blatently violate the SFP MSA in respect of timings.
I would not be surprised if this module's behaviour with TX_FAULT is
along those same lines; the manufacturer has decided to use TX_FAULT
for some other purpose against the SFP MSA, which will cause problems
in any SFP MSA compliant host.
> Sometime in the past sfp.c was not available in the distro and the issue
> never exhibited. Back then the module's operations mode been set through a
> py script - see bottom - but it would appear that it did not implement any
> sm checks.
That python script is very simple. It reads the EEPROM, and attempts
to work out what kind of link to use. It doesn't care about any of
the SFP control and status signals. It doesn't care if you yank the
SFP out of the cage.
BTW, I notice in you original kernel that you have at least one of my
"experimental" patches on your stable kernel taken from my "phy" branch
which has never been in mainline, so I guess you're using the OpenWRT
kernel? I have submitted patches to bring the SFP state machine up to
what was in v5.4 (and a few extra bits) to the OpenWRT maintainers as
part of some commercial work. As I say, I'm not expecting much to
change as a result of those given what you've reported thus far.
As I've said, I think it may need a quirk so we ignore the TX_FAULT
signal. Sorting out a patch to do that for a 4.19.xx kernel is not
going to happen soon, as the hardware I was building the OpenWRT
kernel on isn't in a functional state at the moment - and given the
unknown status of the previously submitted patches as well, I'm not
inclined to produce any further patches for OpenWRT at the moment.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
Powered by blists - more mailing lists