lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bb2c2eed-5efa-00f6-0e52-1326669c1b0d@gmx.net>
Date:   Thu, 9 Jan 2020 17:35:23 +0000
From:   ѽ҉ᶬḳ℠ <vtol@....net>
To:     Russell King - ARM Linux admin <linux@...linux.org.uk>
Cc:     Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org
Subject: Re: [drivers/net/phy/sfp] intermittent failure in state machine
 checks


Kai Meitzner
On 09/01/2020 15:58, Russell King - ARM Linux admin wrote:
> On Thu, Jan 09, 2020 at 03:03:24PM +0000, ѽ҉ᶬḳ℠ wrote:
>> On 09/01/2020 14:41, Andrew Lunn wrote:
>>> On Thu, Jan 09, 2020 at 01:47:31PM +0000, ѽ҉ᶬḳ℠ wrote:
>>>> On node with 4.19.93 and a SFP module (specs at the bottom) the following is
>>>> intermittently observed:
>>> Please make sure Russell King is in Cc: for SFP issues.
>>>
>>> The state machine has been reworked recently. Please could you try
>>> net-next, or 5.5-rc5.
>>>
>>> Thanks
>>> 	Andrew
>> Unfortunately testing those branches is not feasible since the router (see
>> architecture below) that host the SFP module deploys the OpenWrt downstream
>> distro with LTS kernels - in their Master development branch 4.19.93 being
>> the most recent on offer.
> I don't think the rework will make any difference in this case, and
> I don't think there's anything failing in the software here.  The
> reported problem seems to be this:
>
>   sfp sfp: module transmit fault indicated
>   sfp sfp: module transmit fault recovered
>   sfp sfp: module transmit fault indicated
>   sfp sfp: module persistently indicates fault, disabling
>
> which occurs if the module asserts the TX_FAULT signal.  The SFP MSA
> defines that this indicates a problem with the laser safety circuitry,
> and defines a way to reset the fault (by pulsing TX_DISABLE and going
> through another initialisation).
>
> When TX_FAULT is asserted for the first time, "module transmit fault
> indicated" is printed, and we start the process of recovery.  If we
> successfully recover, then "module transmit fault recovered" will be
> printed.
>
> We try several times to recover the fault, and once we're out of
> retries, "module persistently indicates fault, disabling" will be
> printed; at that point, we've declared the module to be dead, and
> we won't do anything further with it.
>
> This is by design; if the module is saying that the laser safety
> circuitry is faulty, then endlessly resetting the module to recover
> from that fault is not sane.
>
> However, there's some modules (particularly GPON modules) that do
> things quite differently from what the SFP MSA says, which is
> extremely annoying and frustrating for those of us who are trying to
> implement the host support.  There are some which seem to assert
> TX_FAULT for unknown reasons.
>
> In your original post (which you need to have sent to me, I don't
> read netdev) you've provided "SFP module specs" - not really, you
> provided the ethtool output, which is not the same as the module
> specs.  Many modules have misleading EEPROM information, sometimes
> to work around what people call "vendor lockin" or maybe to get
> their module to work in some specific equipment.  In any case,
> EEPROM information is not a specification.
>
> For example, your module claims to be a 1000BASE-SX module.  If
> I lookup "allnet ALL4781", I find that it's a VDSL2 module.  That
> isn't a 1000BASE-SX module - 1000BASE-SX is an IEEE 802.3 defined
> term to mean 1000BASE-X over fiber using a short-wavelength laser.
>
> So, given that it doesn't have a laser, why is it raising TX_FAULT.
> No idea; these modules are a law to themselves.
>
> I think the only thing we could do is to implement a workaround to
> ignore TX_FAULT for this module... great, more quirks. :(
>

Thank you for the extensive feedback and explanation.

Pardon for having mixed up the semantics on module specifications vs. 
EEPROM dump...

The module (chipset) been designed by Metanoia, not sure who is the 
actual manufacturer, and probably just been branded Allnet.
The designer provides some proprietary management software (called EBM) 
to their wholesale buyers only

Trough EBM (got hold of something called DSLmonitor Lite) t reports as:

- board type                            AURORA
- Mode / Operation mode        RT / VDSL

If it would be any help I could provide what the EBM software calls a 
"SoC dump".

I opened a support ticket with Allnet a few days back their response is 
yet to arrive.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ