lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2428ec56-f2db-4769-aaca-ca09e57b8162@lunn.ch>
Date: Thu, 28 Nov 2024 15:54:52 +0100
From: Andrew Lunn <andrew@...n.ch>
To: Krzysztof Hałasa <khalasa@...p.pl>
Cc: netdev <netdev@...r.kernel.org>, Oliver Neukum <oneukum@...e.com>,
	Andrew Lunn <andrew+netdev@...n.ch>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org,
	Jose Ignacio Tornos Martinez <jtornosm@...hat.com>,
	Ming Lei <ming.lei@...hat.com>
Subject: Re: [PATCH] PHY: Fix no autoneg corner case

On Thu, Nov 28, 2024 at 07:31:48AM +0100, Krzysztof Hałasa wrote:
> Andrew,
> 
> Andrew Lunn <andrew@...n.ch> writes:
> 
> >> Unfortunately it's initially set based on the supported capability
> >> rather than the actual hw setting.
> >
> > We need a clear definition of 'initially', and when does it actually
> > matter.
> >
> > Initially, things like speed, duplex and set to UNKNOWN. They don't
> > make any sense until the link is up. phydev->advertise is set to
> > phydev->supported, so that we advertise all the capabilities of the
> > PHY. However, at probe, this does not really matter, it is only when
> > phy_start() is called is the hardware actually configured with what it
> > should advertise, or even if it should do auto-neg or not.
> >
> > In the end, this might not matter.
> 
> Nevertheless, it seems it does matter.
> 
> >> While in most cases there is no
> >> difference (i.e., autoneg is supported and on by default), certain
> >> adapters (e.g. fiber optics) use fixed settings, configured in hardware.
> >
> > If the hardware is not capable of supporting autoneg, why is autoneg
> > in phydev->supported? To me, that is the real issue here.
> 
> Well, autoneg *IS* supported by the PHY in this case.
> No autoneg in phydev->supported would mean I can't enable it if needed,
> wouldn't it?
> 
> It is supported but initially disabled.
> 
> With current code, PHY correctly connects to the other side, all the
> registers are valid etc., the PHY indicates, for example, a valid link
> with 100BASE-FX full duplex etc.
> 
> Yet the Linux netdev, ethtool etc. indicate no valid link, autoneg on,
> and speed/duplex unknown. It's just completely inconsistent with the
> real hardware state.
> 
> It seems the phy/phylink code assumes the PHY starts with autoneg
> enabled (if supported). This is simply an incorrect assumption.

This is sounding like a driver bug. When phy_start() is called it
kicks off the PHY state machine. That should result in
phy_config_aneg() being called. That function is badly named, since it
is used both for autoneg and forced setting. The purpose of that call
is to configure the PHY to the configuration stored in
phydev->advertise, etc. So if the PHY by hardware defaults has autoneg
disabled, but the configuration in phydev says it should be enabled,
calling phy_config_aneg() should actually enabled autoneg. It is
possible there is a phylib bug here, because we try to not to kick off
autoneg if it is not needed, because it is slow. I've not looked at
the code, but it could be we see there is link, and skip calling
phy_config_aneg()? Maybe try booting with the cable disconnected so
there is no link?

> BTW if the code meant to enable autoneg, it would do exactly that -
> enable it by writing to PHY command register.

Assuming bug free code.

> Then the hw and sw state
> would be consistent again (though initial configuration would be
> ignored, not very nice). Now the code doesn't enable autoneg, it only
> *indicates* it's enabled and in reality it's not.

I would say there are two different issues here.

1) It seems like we are not configuring the hardware to match phydev.
2) We are overwriting how the bootloader etc configured the hardware.

2) is always hard, because how do we know the PHY is not messed up
from a previous boot/crash cycle etc. In general, a driver should try
to put the hardware into a well known state. If we have a clear use
case for this, we can consider how to implement it.

	Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ