netdev - Re: mv88e6240 configuration broken for B850v3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211206215139.fv7xzqbnupk7pxfx@skbuf>
Date:   Mon, 6 Dec 2021 23:51:39 +0200
From:   Vladimir Oltean <olteanv@...il.com>
To:     "Russell King (Oracle)" <linux@...linux.org.uk>
Cc:     Martyn Welch <martyn.welch@...labora.com>,
        Andrew Lunn <andrew@...n.ch>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        netdev@...r.kernel.org, kernel@...labora.com
Subject: Re: mv88e6240 configuration broken for B850v3

On Mon, Dec 06, 2021 at 09:27:33PM +0000, Russell King (Oracle) wrote:
> On Mon, Dec 06, 2021 at 11:13:41PM +0200, Vladimir Oltean wrote:
> > On Mon, Dec 06, 2021 at 08:51:09PM +0000, Russell King (Oracle) wrote:
> > > With a bit of knowledge of how Marvell DSA switches work...
> > > 
> > > The "ppu" is the PHY polling unit. When the switch comes out of reset,
> > > the PPU probes the MDIO bus, and sets the bit in the port status
> > > register depending on whether it detects a PHY at the port address by
> > > way of the PHY ID values. This bit is used to enable polling of the
> > > PHY and is what mv88e6xxx_port_ppu_updates() reports. This bit will be
> > > set for all internal PHYs unless we explicitly turn it off (we don't.)
> > > Therefore, this is a reasonable assumption to make.
> > > 
> > > So, given that mv88e6xxx_port_ppu_updates() is most likely true as
> > > I stated, it is also true that mv88e6xxx_phy_is_internal() is
> > > "don't care".
> > 
> > And the reason why you bring the PPU into the discussion is because?
> > If the issue manifests itself with or without it, and you come up with a
> > proposal to set LINK_UNFORCED in mv88e6xxx_mac_config if the PPU is
> > used, doesn't that, logically speaking, still leave the issue unsolved
> > if the PPU is _not_ used for whatever reason?
> > The bug has nothing to do with the PPU. It can be solved by checking for
> > PPU in-band status as you say. Maybe. But I've got no idea why we don't
> > address the elephant in the room, which is in dsa_port_link_register_of()?
> 
> I think I've covered that in the other sub-thread.
> 
> It could be that a previous configuration left the port forced down.
> For example, if one were to kexec from one kernel that uses a
> fixed-link that forced the link down, into the same kernel with a
> different DT that uses PHY mode.
> 
> The old kernel may have called mac_link_down(MLO_AN_FIXED), and the
> new kernel wouldn't know that. It comes along, and goes through the
> configuration process and calls mac_link_up(MLO_AN_PHY)... and from
> what you're suggesting, because these two calls use different MLO_AN_xxx
> constants that's a bug.

Indeed I don't have detailed knowledge of Marvell hardware, but I'm
surprised to see kexec being mentioned here as a potential source of
configurations which the driver does not expect to handle. My belief was
that kexec's requirements would be just to silence the device
sufficiently such that it doesn't cause any surprises when things such
interrupts are enabled (DMA isn't relevant for DSA switches).
It wouldn't be responsible for leaving the hardware in any other state
otherwise.

I see this logic in the driver, does it not take care of bringing the
ports to a known state, regardless of what a previous boot stage may
have done?

static int mv88e6xxx_switch_reset(struct mv88e6xxx_chip *chip)
{
	int err;

	err = mv88e6xxx_disable_ports(chip);
	if (err)
		return err;

	mv88e6xxx_hardware_reset(chip);

	return mv88e6xxx_software_reset(chip);
}

So unless I'm fooled by mentally putting an equality sign between
mv88e6xxx_switch_reset() and getting rid of whatever a previous kernel
may have done, I don't think at all that the two cases are comparable:
kexec and a previous call to mv88e6xxx_mac_link_down() initiated by
dsa_port_link_register_of() from this kernel.

> 
> An alternative: the hardware boots up with the link forced down. The
> boot loader doesn't touch it. The kernel boots and calls
> mac_link_up(MLO_AN_PHY).

Again, in my simplistic view, the switch reset deals with this too.
Maybe I'm wrong.

> This all works as expected with e.g. mvneta. It doesn't work with
> Marvell DSA because we have all these additional extra exceptional
> cases to deal with the PPU (which is what _actually_ transfers the
> PHY status to the port registers for all PHYs.)
> 
> We used to just rely on the PPU bit for making the decision, but when
> I introduced that helper, I forgot that the PPU bit doesn't exist on
> the 6250 family, which resulted in commit 4a3e0aeddf09. Looking at
> 4a3e0aeddf09, I now believe the fix there to be wrong. It should
> have made mv88e6xxx_port_ppu_updates() follow
> mv88e6xxx_phy_is_internal() for internal ports only for the 6250 family
> that has the link status bit in that position, especially as one can
> disable the PPU bit in DSA switches such as 6390, which for some ports
> stops the PHY being used and switches the port to serdes mode.
> "Internal" ports aren't always internal on these switches.
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!