lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200110114433.GZ25745@shell.armlinux.org.uk>
Date:   Fri, 10 Jan 2020 11:44:33 +0000
From:   Russell King - ARM Linux admin <linux@...linux.org.uk>
To:     ѽ҉ᶬḳ℠ <vtol@....net>
Cc:     Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org
Subject: Re: [drivers/net/phy/sfp] intermittent failure in state machine
 checks

On Fri, Jan 10, 2020 at 09:50:00AM +0000, ѽ҉ᶬḳ℠ wrote:
> On 10/01/2020 09:27, Russell King - ARM Linux admin wrote:
> > On Thu, Jan 09, 2020 at 11:50:14PM +0000, ѽ҉ᶬḳ℠ wrote:
> > > On 09/01/2020 23:10, Russell King - ARM Linux admin wrote:
> > > > Please don't use mii-tool with SFPs that do not have a PHY; the "PHY"
> > > > registers are emulated, and are there just for compatibility. Please
> > > > use ethtool in preference, especially for SFPs.
> > > Sure, just ethtool is not much of help for this particular matter, all there
> > > is ethtool -m and according to you the EEPROM dump is not to be relied on.
> > How about just "ethtool eth2" ?
> 
> Settings for eth2:
>         Supported ports: [ TP ]
>         Supported link modes:   1000baseX/Full
>         Supported pause frame use: Symmetric
>         Supports auto-negotiation: Yes
>         Supported FEC modes: Not reported
>         Advertised link modes:  1000baseX/Full
>         Advertised pause frame use: Symmetric
>         Advertised auto-negotiation: Yes
>         Advertised FEC modes: Not reported
>         Speed: 1000Mb/s
>         Duplex: Full
>         Port: Twisted Pair
>         PHYAD: 0
>         Transceiver: internal
>         Auto-negotiation: on
>         MDI-X: Unknown
>         Supports Wake-on: d
>         Wake-on: d
>         Link detected: yes

That looks fine.

> > > > CONFIG_DEBUG_GPIO is not the same as having debugfs support enabled.
> > > > If debugfs is enabled, then gpiolib will provide the current state
> > > > of gpios through debugfs.  debugfs is normally mounted on
> > > > /sys/kernel/debug, but may not be mounted by default depending on
> > > > policy.  Looking in /proc/filesystems will tell you definitively
> > > > whether debugfs is enabled or not in the kernel.
> > > debugsfs is mounted but ls -af /sys/kernel/debug/gpio only producing
> > > (oddly):
> > > 
> > > /sys/kernel/debug/gpio
> > Try "cat /sys/kernel/debug/gpio"
> 
> gpiochip2: GPIOs 504-511, parent: i2c/8-0071, pca9538, can sleep:
>  gpio-504 (                    |tx-fault            ) in  lo IRQ
>  gpio-505 (                    |tx-disable          ) out lo
>  gpio-506 (                    |rate-select0        ) in  lo
>  gpio-507 (                    |los                 ) in  lo IRQ
>  gpio-508 (                    |mod-def0            ) in  lo IRQ

Which is also indicating everything is correct.  When the problem
occurs, check the state of the signals again as close as possible
to the event - it depends how long the transceiver keeps it
asserted.  You will probably find tx-fault is indicating
"in  hi IRQ".

> Meantime Allnet responded, which basically sums up to (blame ping pong - it
> is not me but go and look there instead...)
> 
> - driver support is not being handled by Allnet but by Metanoia, latter
> being designer and manufacturer
> - Allnet does not have the buying power to persuade Metanoia to look into
> the matter

... which is pretty standard; no one will rework their SFP unless
they fear their sales will be severely impacted by the issue.

> - it would appear that SFP.C is trying to communicate with Fiber-GBIC and
> fails since the signal reports may not be 100% compatible

That's a fun claim, but note carefully the wording "may" which implies
some uncertainty in the statement.

Let's look at the wording of the GBIC (SFF-8053) and SFP (INF-8074 -
SFP MSA) documents.  The wording for the "fault recovery" is identical
between the two, which concerns what happens when TX_FAULT is asserted
and how to recover from that.

Concerning the implementation of TX_FAULT, SFF-8053 states:

  If no transmitter safety circuitry is implemented, the TX_FAULT signal
  may be tied to its negated state.

but then says later in the document:

  If TX_FAULT is not implemented, the signal shall be held to the low
  state by the GBIC.

Meanwhile, INF-8074 similarly states:

  If no transmitter safety circuitry is implemented, the TX_FAULT signal
  may be tied to its negated state.

but later on has a similar statement:

  TX_FAULT shall be implemented by those module definitions of SFP
  transceiver supporting safety circuitry. If TX_FAULT is not
  implemented, the signal shall be held to the low state by the SFP
  transceiver.

"shall" in both cases is stronger than "may".  So, there seems to be
little difference between the GBIC and SFP usage of this signal.

Their claim is that sfp.c implements the older GBIC style of signal
reports.  My counter-claim is that (a) sfp.c is written to the SFP MSA
and not the GBIC standard, and (b) there is no difference as far as the
TX_FAULT signal is concerned between the GBIC standard and the SFP MSA.

But... it doesn't matter that much, there's a module out there (and it
isn't the only one) which does "funny stuff" with its TX_FAULT signal.
Either we decide we want to support it and implement a quirk, or we
decide we don't want to support it.

There is an option bit in the EEPROM that is supposed to indicate
whether the module supports TX_FAULT, but, as you can guess, there are
problems with using that, as:

1) there are a lot of modules, particularly optical modules, that
   implement TX_FAULT correctly but don't set the option bit to say
   that they support the signal.

2) the other module I'm aware of that does "funny stuff" with its
   TX_FAULT signal does have the TX_FAULT option bit set.

So, the option bit is completely untrustworthy and, therefore, is
meaningless (which is why we don't use it.)

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ