netdev - Re: [PATCH net-next v2 7/9] net: phy: introduce ethtool_phy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f1af0323-23f5-44fd-a980-686815957b5a@lunn.ch>
Date: Tue, 8 Oct 2024 15:00:53 +0200
From: Andrew Lunn <andrew@...n.ch>
To: Maxime Chevallier <maxime.chevallier@...tlin.com>
Cc: "Russell King (Oracle)" <linux@...linux.org.uk>, davem@...emloft.net,
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
	thomas.petazzoni@...tlin.com, Jakub Kicinski <kuba@...nel.org>,
	Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
	linux-arm-kernel@...ts.infradead.org,
	Christophe Leroy <christophe.leroy@...roup.eu>,
	Herve Codina <herve.codina@...tlin.com>,
	Florian Fainelli <f.fainelli@...il.com>,
	Heiner Kallweit <hkallweit1@...il.com>,
	Vladimir Oltean <vladimir.oltean@....com>,
	Marek Behún <kabel@...nel.org>,
	Köry Maincent <kory.maincent@...tlin.com>,
	Oleksij Rempel <o.rempel@...gutronix.de>
Subject: Re: [PATCH net-next v2 7/9] net: phy: introduce ethtool_phy_ops to
 get and set phy configuration

> > So you have at least regulators under Linux control? Is that what you
> > mean by power down? Pulling the plug and putting it back again is
> > somewhat different to isolation. All its state is going to be lost,
> > meaning phylib needs to completely initialise it again. Or can you
> > hide this using PM? Just suspend/resume it?
> 
> Ah no, I wasn't referring to regulators but rather the BMCR PDOWN bit to
> just shut the PHY down, as in suspend.

Ah! I wounder what 802.3 says about PDOWN? Does it say anything about
it being equivalent to ISOLATE? That the pins go HI-Z? Are we talking
about something semi-reliable, or something which just happens to work
for this PHY?

> Indeed the state is lost. The way I'm supporting this is :
> 
>  - If one PHY has the link, it keeps it until link-down
>  - When link-down, I round-robin between the 2 phys: 
> 
>   - Attach the PHY to the netdev
>   - See if it can establish link and negotiate with LP
>   - If there's nothing after a given period ( 2 seconds default ), then
> I detach the PHY, attach the other one, and start again, until one of
> them has link.

This sounds pretty invasive to the MAC driver. I don't think you need
to attach/detach each cycle, since you don't need to send/receive any
packets. You could hide this all in phylib. But that should be
considered as part of the bigger picture.

I assume it is not actually 2 seconds, but some random number in the
range 1-3 seconds, so when both ends are searching they do eventually
find each other?

> > That explains the hardware, but what are the use cases? How did the
> > hardware designer envision this hardware being used?
> 
> The use-case is link redundancy, if one PHY loses the link, we hope
> that we still have link on the other one and switchover. This is one of
> the things I discussed at netdev 0x17.

> > If you need to power the PHY off, you cannot have dynamic behaviour
> > where the first to have link wins. But if you can have the media side
> > functional, you can do some dynamic behaviours.
> 
> True.
> 
> > Although, is it wise
> > for the link to come up, yet to be functionally dead because it has no
> > MAC connected?
> 
> Good point. What would you think ? I already deal with the identified
> issue which is that both PHYs are link-up with LP, both connected to
> the same switch. When we switch between the active PHYs, we send a
> gratuitous ARP on the new PHY to refresh the switch's FDB.

It seems odd to me you have redundant cables going to one switch? I
would have the cables going in opposite directions, to two different
switches, and have the switches in at a minimum a ring, or ideally a
mesh.

I don't think the ARP is necessary. The link peer switch should flush
its tables when the link goes down. But switches further away don't
see such link events, yet they learn about the new location of the
host. I would also expect the host sees a loss of carrier and then the
carrier restored, which probably flushes all its tables, so it is
going to ARP anyway.

> 
> Do you see that as being an issue, having the LP see link-up when the
> link cannot actually convey data ? Besides the energy detect feature
> you mention, I don't see what other options we can have unfortunately :(

Maybe see what 802.3 says about advertising with no link
modes. Autoneg should complete, in that the peers exchange messages,
but the result of the autoneg is that they have no common modes, so
the link won't come up. Is it clearly defined what should happen in
this case? But we are in a corner case, similar to ISOLATE, which i
guess rarely gets tested, so is often broken. I would guess power
detection would be more reliable when implemented. 

> > There are some Marvell Switches which support both internal Copper
> > PHYs and a SERDES port. The hardware allows first to get link to have
> > a functional MAC. But in Linux we have not supported that, and we
> > leave the unused part down so it does not get link.
> 
> My plan is to support these as well. For the end-user, it makes no
> difference wether the HW internally has 2 PHYs each with one port, or 1
> phy with 2 ports. So to me, if we want to support phy_mux, we should
> also support the case you mention above. I have some code to support
> this, but that's the part where I'm still getting things ironed-out,
> this is pretty tricky to represent that properly, especially in DT.
> 
> >
> > Maybe we actually want energy detect, not link, to decide which PHY
> > should get the MAC?  But i have no real idea what you can do with
> > energy detect, and it would also mean building out the read_status()
> > call to report additional things, etc.
> 
> Note that I'm trying to support a bigger set of use-cases besides the
> pure 2-PHY setup. One being that we have a MUX within the SoC on the
> SERDES lanes, allowing to steer the MII interface between a PHY and an
> SFP bus (Turris Omnia has such a setup). Is it possible to have an
> equivalent "energy detect" on all kinds of SFPs ?

The LOS pin, which indicates if there is light entering the SFP.

> As a note, I do see that both Russell and you may think you're being
> "drip-fed" (I learned that term today) information, that's not my
> intent at all, I wasn't expecting this discussion now, sorry about that.

It is a difficult set of problems, and you are addressing it from the
very niche end first using mechanisms which i expect are not reliably
implemented. So we are going to ask lots of questions.

You probably would of got less questions if you have started with the
use cases for the Turris Omnia and Marvell Ethernet switch, which are
more mainstream, and then extended it with your niche device. But i
can understand this order, you probably have a customer with this
niche device...

	Andrew