netdev - Re: mv88e6240 configuration broken for B850v3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 7 Dec 2021 01:44:43 +0200
From:   Vladimir Oltean <olteanv@...il.com>
To:     "Russell King (Oracle)" <linux@...linux.org.uk>
Cc:     Martyn Welch <martyn.welch@...labora.com>,
        Andrew Lunn <andrew@...n.ch>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        netdev@...r.kernel.org, kernel@...labora.com
Subject: Re: mv88e6240 configuration broken for B850v3

On Mon, Dec 06, 2021 at 10:22:15PM +0000, Russell King (Oracle) wrote:
> On Mon, Dec 06, 2021 at 11:51:39PM +0200, Vladimir Oltean wrote:
> > On Mon, Dec 06, 2021 at 09:27:33PM +0000, Russell King (Oracle) wrote:
> > > On Mon, Dec 06, 2021 at 11:13:41PM +0200, Vladimir Oltean wrote:
> > > > On Mon, Dec 06, 2021 at 08:51:09PM +0000, Russell King (Oracle) wrote:
> > > > > With a bit of knowledge of how Marvell DSA switches work...
> > > > > 
> > > > > The "ppu" is the PHY polling unit. When the switch comes out of reset,
> > > > > the PPU probes the MDIO bus, and sets the bit in the port status
> > > > > register depending on whether it detects a PHY at the port address by
> > > > > way of the PHY ID values. This bit is used to enable polling of the
> > > > > PHY and is what mv88e6xxx_port_ppu_updates() reports. This bit will be
> > > > > set for all internal PHYs unless we explicitly turn it off (we don't.)
> > > > > Therefore, this is a reasonable assumption to make.
> > > > > 
> > > > > So, given that mv88e6xxx_port_ppu_updates() is most likely true as
> > > > > I stated, it is also true that mv88e6xxx_phy_is_internal() is
> > > > > "don't care".
> > > > 
> > > > And the reason why you bring the PPU into the discussion is because?
> > > > If the issue manifests itself with or without it, and you come up with a
> > > > proposal to set LINK_UNFORCED in mv88e6xxx_mac_config if the PPU is
> > > > used, doesn't that, logically speaking, still leave the issue unsolved
> > > > if the PPU is _not_ used for whatever reason?
> > > > The bug has nothing to do with the PPU. It can be solved by checking for
> > > > PPU in-band status as you say. Maybe. But I've got no idea why we don't
> > > > address the elephant in the room, which is in dsa_port_link_register_of()?
> > > 
> > > I think I've covered that in the other sub-thread.
> > > 
> > > It could be that a previous configuration left the port forced down.
> > > For example, if one were to kexec from one kernel that uses a
> > > fixed-link that forced the link down, into the same kernel with a
> > > different DT that uses PHY mode.
> > > 
> > > The old kernel may have called mac_link_down(MLO_AN_FIXED), and the
> > > new kernel wouldn't know that. It comes along, and goes through the
> > > configuration process and calls mac_link_up(MLO_AN_PHY)... and from
> > > what you're suggesting, because these two calls use different MLO_AN_xxx
> > > constants that's a bug.
> > 
> > Indeed I don't have detailed knowledge of Marvell hardware, but I'm
> > surprised to see kexec being mentioned here as a potential source of
> > configurations which the driver does not expect to handle. My belief was
> > that kexec's requirements would be just to silence the device
> > sufficiently such that it doesn't cause any surprises when things such
> > interrupts are enabled (DMA isn't relevant for DSA switches).
> > It wouldn't be responsible for leaving the hardware in any other state
> > otherwise.
> > 
> > I see this logic in the driver, does it not take care of bringing the
> > ports to a known state, regardless of what a previous boot stage may
> > have done?
> > 
> > static int mv88e6xxx_switch_reset(struct mv88e6xxx_chip *chip)
> > {
> > 	int err;
> > 
> > 	err = mv88e6xxx_disable_ports(chip);
> > 	if (err)
> > 		return err;
> > 
> > 	mv88e6xxx_hardware_reset(chip);
> > 
> > 	return mv88e6xxx_software_reset(chip);
> > }
> > 
> > So unless I'm fooled by mentally putting an equality sign between
> > mv88e6xxx_switch_reset() and getting rid of whatever a previous kernel
> > may have done, I don't think at all that the two cases are comparable:
> > kexec and a previous call to mv88e6xxx_mac_link_down() initiated by
> > dsa_port_link_register_of() from this kernel.
> 
> If the hardware reset is not wired to be under software control or is
> not specified, then mv88e6xxx_hardware_reset() is a no-op.
> 
> mv88e6xxx_software_reset() does not fully reinitialise the switch.
> To quote one switch manual for the SWReset bit "Register values are not
> modified." That means if the link was forced down previously by writing
> to the port control register, the port remains forced down until
> software changes that register to unforce the link, or to force the
> link up.

Ouch, this is pretty unfortunate if true. But please allow me to suggest
that not all DSA switches are like this, and that this is a pretty weak
justification for the placement of a phylink_mac_link_down call in no
other place than dsa_port_link_register_of. If this is an indication of
anything, the two DSA drivers that I maintain have worked just fine in
the time frame between the DSA conversion to forcing the link in
mac_link_up and the DSA change to force a mac_link_down before
connecting to phylink, therefore do not need that change.
Therefore, I believe that it isn't fair to create avoidable baggage for
other drivers, that may end up depending without even realizing on this
non-standard arrangement of phylink calls. If the mac_link_down would
have been in phylink I wouldn't have had any problem. Same if the same
call would have been initiated by mv88e6xxx itself.
Is there any technical reason why the mv88e6xxx driver (or others if
others exist) cannot turn off its ports by itself and needs to be driven
by an external phylink_mac_link_down call to do that (with extra care
taken that the port is able to be turned back on again by phylink if
needed)? It can't be that can't compute the arguments to call the
function with - because they aren't correct in the current form of the
code either. It also can't be due to the timing, because we are here:

static int dsa_tree_setup_switches(struct dsa_switch_tree *dst)
{
	struct dsa_port *dp;
	int err;

	list_for_each_entry(dp, &dst->ports, list) {
		err = dsa_switch_setup(dp->ds);
		-> calls ->setup()
		if (err)
			goto teardown;
	}

	list_for_each_entry(dp, &dst->ports, list) {
		err = dsa_port_setup(dp);
		-> calls dsa_port_link_register_of()
		   -> calls phylink_mac_link_down()
		if (err) {
			err = dsa_port_reinit_as_unused(dp);
			if (err)
				goto teardown;
		}
	}

So since we are positioned where we are in the DSA initialization
sequence, forcing the CPU ports down at the end of ->setup() should be
close enough temporally to where it is currently done now?