lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <26d9cccd-2351-483f-a417-03d484fd25a4@lunn.ch>
Date: Wed, 15 Oct 2025 22:44:31 +0200
From: Andrew Lunn <andrew@...n.ch>
To: "Russell King (Oracle)" <rmk+kernel@...linux.org.uk>
Cc: Heiner Kallweit <hkallweit1@...il.com>,
	Abhishek Chauhan <quic_abchauha@...cinc.com>,
	Alexandre Torgue <alexandre.torgue@...s.st.com>,
	Alexis Lothore <alexis.lothore@...tlin.com>,
	Andrew Lunn <andrew+netdev@...n.ch>,
	Boon Khai Ng <boon.khai.ng@...era.com>,
	Choong Yong Liang <yong.liang.choong@...ux.intel.com>,
	Daniel Machon <daniel.machon@...rochip.com>,
	"David S. Miller" <davem@...emloft.net>,
	Drew Fustini <dfustini@...storrent.com>,
	Emil Renner Berthing <emil.renner.berthing@...onical.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Faizal Rahim <faizal.abdul.rahim@...ux.intel.com>,
	Furong Xu <0x1207@...il.com>, Inochi Amaoto <inochiama@...il.com>,
	Jacob Keller <jacob.e.keller@...el.com>,
	Jakub Kicinski <kuba@...nel.org>,
	"Jan Petrous (OSS)" <jan.petrous@....nxp.com>,
	Jisheng Zhang <jszhang@...nel.org>, Kees Cook <kees@...nel.org>,
	Kunihiko Hayashi <hayashi.kunihiko@...ionext.com>,
	Lad Prabhakar <prabhakar.mahadev-lad.rj@...renesas.com>,
	Ley Foon Tan <leyfoon.tan@...rfivetech.com>,
	linux-arm-kernel@...ts.infradead.org, linux-arm-msm@...r.kernel.org,
	linux-stm32@...md-mailman.stormreply.com,
	Matthew Gerlach <matthew.gerlach@...era.com>,
	Maxime Chevallier <maxime.chevallier@...tlin.com>,
	Maxime Coquelin <mcoquelin.stm32@...il.com>,
	Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>,
	netdev@...r.kernel.org, Oleksij Rempel <o.rempel@...gutronix.de>,
	Paolo Abeni <pabeni@...hat.com>,
	Rohan G Thomas <rohan.g.thomas@...era.com>,
	Shenwei Wang <shenwei.wang@....com>,
	Simon Horman <horms@...nel.org>,
	Song Yoong Siang <yoong.siang.song@...el.com>,
	Swathi K S <swathi.ks@...sung.com>,
	Tiezhu Yang <yangtiezhu@...ngson.cn>, Vinod Koul <vkoul@...nel.org>,
	Vladimir Oltean <olteanv@...il.com>,
	Vladimir Oltean <vladimir.oltean@....com>,
	Yu-Chun Lin <eleanor15x@...il.com>
Subject: Re: [PATCH net-next 01/14] net: stmmac: remove broken PCS code

On Wed, Oct 15, 2025 at 03:20:02PM +0100, Russell King (Oracle) wrote:
> Changing the netif_carrier_*() state behind phylink's back has always
> been prohibited because it messes up with phylinks state tracking, and
> means that phylink no longer guarantees to call the mac_link_down()
> and mac_link_up() methods at the appropriate times.  This was later
> documented in the sfp-phylink network driver conversion guide.
> 
> stmmac was converted to phylink in 2019, but nothing was done with the
> "PCS" code. Since then, apart from the updates as part of phylink
> development, nothing has happened with stmmac to improve its use of
> phylink, or even to address this point.
> 
> A couple of years ago, a has_integrated_pcs boolean was added by Bart,
> which later became the STMMAC_FLAG_HAS_INTEGRATED_PCS flag, to avoid
> manipulating the netif_carrier_*() state. This flag is mis-named,
> because whenever the stmmac is synthesized for its native SGMII, TBI
> or RTBI interfaces, it has an "integrated PCS". This boolean/flag
> actually means "ignore the status from the integrated PCS".
> 
> Discussing with Bart, the reasons for this are lost to the winds of
> time (which is why we should always document the reasons in the commit
> message.)
> 
> RGMII also has in-band status, and the dwmac cores and stmmac code
> supports this but with one bug that saves the day.
> 
> When dwmac cores are synthesised for RGMII only, they do not contain
> an integrated PCS, and so priv->dma_cap.pcs is clear, which prevents
> (incorrectly) the "RGMII PCS" being used, meaning we don't read the
> in-band status. However, a core synthesised for RGMII and also SGMII,
> TBI or RTBI will have this capability bit set, thus making these
> code paths reachable.
> 
> The Jetson Xavier NX uses RGMII mode to talk to its PHY, and removing
> the incorrect check for priv->dma_cap.pcs reveals the theortical issue
> with netif_carrier_*() manipulation is real:
> 
> dwc-eth-dwmac 2490000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0
> dwc-eth-dwmac 2490000.ethernet eth0: PHY [stmmac-0:00] driver [RTL8211F Gigabit Ethernet] (irq=141)
> dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features support found
> dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported
> dwc-eth-dwmac 2490000.ethernet eth0: registered PTP clock
> dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii-id link mode
> 8021q: adding VLAN 0 to HW filter on device eth0
> dwc-eth-dwmac 2490000.ethernet eth0: Adding VLAN ID 0 is not supported
> Link is Up - 1000/Full
> Link is Down
> Link is Up - 1000/Full
> 
> This looks good until one realises that the phylink "Link" status
> messages are missing, even when the RJ45 cable is reconnected. Nothing
> one can do results in the interface working. The interrupt handler
> (which prints those "Link is" messages) always wins over phylink's
> resolve worker, meaning phylink never calls the mac_link_up() nor
> mac_link_down() methods.
> 
> eth0 also sees no traffic received, and is unable to obtain a DHCP
> address:
> 
> 3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group defa
> ult qlen 1000
>     link/ether e6:d3:6a:e6:92:de brd ff:ff:ff:ff:ff:ff
>     RX: bytes  packets  errors  dropped overrun mcast
>     0          0        0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     27686      149      0       0       0       0
> 
> With the STMMAC_FLAG_HAS_INTEGRATED_PCS flag set, which disables the
> netif_carrier_*() manipulation then stmmac works normally:
> 
> dwc-eth-dwmac 2490000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0
> dwc-eth-dwmac 2490000.ethernet eth0: PHY [stmmac-0:00] driver [RTL8211F Gigabit Ethernet] (irq=141)
> dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features support found
> dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported
> dwc-eth-dwmac 2490000.ethernet eth0: registered PTP clock
> dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii-id link mode
> 8021q: adding VLAN 0 to HW filter on device eth0
> dwc-eth-dwmac 2490000.ethernet eth0: Adding VLAN ID 0 is not supported
> Link is Up - 1000/Full
> dwc-eth-dwmac 2490000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
> 
> and packets can be transferred.
> 
> This clearly shows that when priv->hw->pcs is set, but
> STMMAC_FLAG_HAS_INTEGRATED_PCS is clear, the driver reliably fails.
> 
> Discovering whether a platform falls into this is impossible as
> parsing all the dtsi and dts files to find out which use the stmmac
> driver, whether any of them use RGMII or SGMII and also depends
> whether an external interface is being used. The kernel likely
> doesn't contain all dts files either.
> 
> The only driver that sets this flag uses the qcom,sa8775p-ethqos
> compatible, and uses SGMII or 2500BASE-X.
> 
> but these are saved from this problem by the incorrect check for
> priv->dma_cap.pcs.
> 
> So, we have to assume that for every other platform that uses SGMII
> with stmmac is using an external PCS.
> 
> Moreover, ethtool output can be incorrect. With the full-duplex link
> negotiated, ethtool reports:
> 
>         Speed: 1000Mb/s
>         Duplex: Half
> 
> because with dwmac4, the full-duplex bit is in bit 16 of the status,
> priv->xstats.pcs_duplex becomes BIT(16) for full duplex, but the
> ethtool ksettings duplex member is u8 - so becomes zero. Moreover,
> the supported, advertised and link partner modes are all "not
> reported".
> 
> Finally, ksettings_set() won't be able to set the advertisement on
> a PHY if this PCS code is activated, which is incorrect when SGMII
> is used with a PHY.
> 
> Thus, remove:
> 1. the incorrect netif_carrier_*() manipulation.
> 2. the broken ethtool ksettings code.
> 
> Given that all uses of STMMAC_FLAG_HAS_INTEGRATED_PCS are now gone,
> remove the flag from stmmac.h and dwmac-qcom-ethqos.c.
> 
> Signed-off-by: Russell King (Oracle) <rmk+kernel@...linux.org.uk>

Reviewed-by: Andrew Lunn <andrew@...n.ch>

    Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ