lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200109231034.GW25745@shell.armlinux.org.uk>
Date:   Thu, 9 Jan 2020 23:10:34 +0000
From:   Russell King - ARM Linux admin <linux@...linux.org.uk>
To:     ѽ҉ᶬḳ℠ <vtol@....net>
Cc:     Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org
Subject: Re: [drivers/net/phy/sfp] intermittent failure in state machine
 checks

On Thu, Jan 09, 2020 at 10:40:24PM +0000, ѽ҉ᶬḳ℠ wrote:
> 
> On 09/01/2020 21:59, Russell King - ARM Linux admin wrote:
> > 
> > Also, note that the Metanoia MT-V5311 (at least mine) uses 1000BASE-X
> > not SGMII. It sends a 16-bit configuration word of 0x61a0, which is:
> > 
> > 		1000BASE-X			SGMII
> > Bit 15	0	No next page			Link down
> > 	1	Ack				Ack
> > 	1	Remote fault 2			Reserved (0)
> > 	0	Remote fault 1			Duplex (0 = Half)
> > 
> > 	0	Reserved (0)			Speed bit 1
> > 	0	Reserved (0)			Speed bit 0 (00=10Mbps)
> > 	0	Reserved (0)			Reserved (0)
> > 	1	Asymetric pause direction	Reserved (0)
> > 
> > 	1	Pause				Reserved (0)
> > 	0	Half duplex not supported	Reserved (0)
> > 	1	Full duplex supported		Reserved (0)
> > 	0	Reserved (0)			Reserved (0)
> > 
> > 	0	Reserved (0)			Reserved (0)
> > 	0	Reserved (0)			Reserved (0)
> > 	0	Reserved (0)			Reserved (0)
> > Bit 0	0	Reserved (0)			Must be 1
> > 
> > So it clearly fits 802.3 Clause 37 1000BASE-X format, reporting 1G
> > Full duplex, and not SGMII (10M Half duplex).
> > 
> > I have a platform here that allows me to get at the raw config_reg
> > word that the other end has sent which allows analysis as per the
> > above.
> > 
> 
> The driver reports also 1000base-x for this Metonia/Allnet module:
> 
> mvneta f1034000.ethernet eth2: switched to inband/1000base-x link mode
> 
> mii-tool -v eth2 producing
> 
> eth2: 1000 Mbit, full duplex, link ok
>   product info: vendor 00:00:00, model 0 rev 0
>   basic mode:   10 Mbit, full duplex
>   basic status: autonegotiation complete, link ok
>   capabilities:
>   advertising:  1000baseT-HD 1000baseT-FD 100baseT4 100baseTx-FD
> 100baseTx-HD 10baseT-FD 10baseT-HD flow-control

Please don't use mii-tool with SFPs that do not have a PHY; the "PHY"
registers are emulated, and are there just for compatibility. Please
use ethtool in preference, especially for SFPs.

> On 09/01/2020 21:34, Russell King - ARM Linux admin wrote:
> > You can check the state of the GPIOs by looking at
> > /sys/kernel/debug/gpio, and you will probably see that TX_FAULT is
> > being asserted by the module.
> 
> With OpenWrt trying to save space wherever they can
> 
> # CONFIG_DEBUG_GPIO is not set
> 
> this avenue is unfortunately is not available. Is there some other way
> (Linux userland) to query TX_FAULT and RX_LOS and whether either/both being
> asserted or deasserted?

CONFIG_DEBUG_GPIO is not the same as having debugfs support enabled.
If debugfs is enabled, then gpiolib will provide the current state
of gpios through debugfs.  debugfs is normally mounted on
/sys/kernel/debug, but may not be mounted by default depending on
policy.  Looking in /proc/filesystems will tell you definitively
whether debugfs is enabled or not in the kernel.

> On 09/01/2020 21:34, Russell King - ARM Linux admin wrote:
> > BTW, I notice in you original kernel that you have at least one of my
> > "experimental" patches on your stable kernel taken from my "phy" branch
> > which has never been in mainline, so I guess you're using the OpenWRT
> > kernel?
> I am not aware were the code originated from. It is not exactly OpenWrt but
> TOS (for the Turris Omnia router), being a downstream patchset that builds
> on top of OpenWrt. The TOS developers might be known at Linux kernel
> development, recently added their MOX platform and also with regard to
> Multi-CPU-DSA.

So, if that is correct...

Current OpenWRT is derived from 4.19-stable kernels, which include
experimental patches picked at some point from my "phy" branch, and
TOS is derived from OpenWRT.

That makes it very difficult for anyone in the mainline kernel
community to do anything about this; sending you a patch is likely
useless since you're not going to be able to test it.

> On 09/01/2020 21:34, Russell King - ARM Linux admin wrote:
> > You're reading/way/  too much into the state machine.
> 
> How so? Those intermittent failures cause disruption in the WAN connectivity
> - nothing life threatening but somewhat inconvenient.

You think the state machines are doing something clever. They don't.
They are all very simple and quite dumb.

> I am trying to get to the bottom of it, with my limited capabilities and
> with your input it has helped. I will ping Allnet again and see whether they
> bother to respond and shed some light of what their modules does with regard
> to TX_FAULT and RX_LOS.

The only real way to get to the bottom of it is to manually enable
debug in sfp.c so its possible to watch what happens, not only with
the hardware signals but also what the state machines are doing.
However, I'm very certain that there is no problem with the state
machines, and it is that the Allnet module is raising TX_FAULT.

I also think from what you've said above that rebuilding a kernel
to enable debug in sfp.c is going to not be possible for you.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ