[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200712132554.GS1551@shell.armlinux.org.uk>
Date: Sun, 12 Jul 2020 14:25:54 +0100
From: Russell King - ARM Linux admin <linux@...linux.org.uk>
To: Martin Rowe <martin.p.rowe@...il.com>
Cc: Andrew Lunn <andrew@...n.ch>, netdev@...r.kernel.org,
davem@...emloft.net, vivien.didelot@...il.com
Subject: Re: bug: net: dsa: mv88e6xxx: unable to tx or rx with Clearfog GT 8K
(with git bisect)
On Sun, Jul 12, 2020 at 01:00:48PM +0000, Martin Rowe wrote:
> On Sat, 11 Jul 2020 at 19:23, Russell King - ARM Linux admin
> <linux@...linux.org.uk> wrote:
> > On Sat, Jul 11, 2020 at 06:23:49PM +0200, Andrew Lunn wrote:
> > > So i'm guessing it is the connection between the CPU and the switch.
> > > Could you confirm this? Create a bridge, add two ports of the switch
> > > to the bridge, and then see if packets can pass between switch ports.
> > >
> > > If it is the connection between the CPU and the switch, i would then
> > > be thinking about the comphy and the firmware. We have seen issues
> > > where the firmware is too old. That is not something i've debugged
> > > myself, so i don't know where the version information is, or what
> > > version is required.
> >
> > However, in the report, Martin said that reverting the problem commit
> > from April 14th on a kernel from July 6th caused everything to work
> > again. That is quite conclusive that 34b5e6a33c1a is the cause of
> > the breakage.
>
> I tried it anyway and couldn't get any traffic to flow between the
> ports, but I could have configured it wrongly. I gave each port a
> static IP, bridged them (with and without br0 having an IP assigned),
> and tried pinging from one port to the other. I tried with the
> assigned IPs in the same and different subnets, and made sure the
> routes were updated between tests. Tx only, no responses, exactly like
> pinging a remote host.
Note that you shouldn't need to set an ip on the bridge ports
themselves.
If you do this:
# ip li set dev eth1 up
# brctl addbr br0
# for port in lan1 lan2 lan3 lan4; do ip li set dev $port up; \
brctl addif br0 $port; done
Then you should be able to pass traffic between the LAN ports - the
packets should stay on the DSA switch and should not involve the CPU.
If you have machine A with address 192.168.2.1/24 on lan1 and machine B
with address 192.168.2.2/24 on lan2, then they should be able to ping
each other - the packet flow will be through the DSA switch without
involving the CPU.
If that doesn't work, then the next step is to directly connect machine
A to machine B and confirm that works. If it works there, but does not
work when connected to the DSA switch, then it points to the DSA LAN
ports being incorrectly configured.
At that point, what may help is to get a dump of the registers
associated with each of the ports:
# ethtool -d lan<N>
and then we can see how the kernel is configuring them.
If it is a port issue, that should help pinpoint it - if it's a problem
with the CPU port configuration, then ethtool can't read those registers
(and the only way to get them is to apply some debugfs patch that was
refused from being merged into mainline.)
> I'm now less confident about my git bisect, though, because it appears
> my criteria for verifying if a commit was "good" was not sufficient. I
> was just checking to see if the port could get assigned a DHCP address
> and ping something else, but it appears that (at least on 5.8-rc4 with
> the one revert) the interface "dies" after working for about 30-60
> seconds. Basically the symptoms I described originally, just preceded
> by 30-60 seconds of it working perfectly. I will re-run the bisect to
> figure out what makes it go from "working perfectly" to "working
> perfectly for less than a minute", which will take a few days.
That seems really weird!
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Powered by blists - more mailing lists