lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <19ee2170-649c-3bbb-7fba-8834d5ba9372@seco.com>
Date:   Mon, 27 Jun 2022 11:17:13 -0400
From:   Sean Anderson <sean.anderson@...o.com>
To:     "Russell King (Oracle)" <linux@...linux.org.uk>
Cc:     "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Madalin Bucur <madalin.bucur@....com>, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        Paolo Abeni <pabeni@...hat.com>,
        Eric Dumazet <edumazet@...gle.com>
Subject: Re: [PATCH net-next 25/28] [RFC] net: dpaa: Convert to phylink



On 6/23/22 8:24 PM, Russell King (Oracle) wrote:
> On Thu, Jun 23, 2022 at 06:39:08PM -0400, Sean Anderson wrote:
>> Hi Russell,
>> 
>> On 6/18/22 11:58 AM, Sean Anderson wrote:
>> > Hi Russell,
>> > 
>> > On 6/18/22 4:22 AM, Russell King (Oracle) wrote:
>> >> On Fri, Jun 17, 2022 at 08:45:38PM -0400, Sean Anderson wrote:
>> >>> Hi Russell,
>> >>>
>> >>> Thanks for the quick response.
>> >>> ...
>> >>> Yes, I've been using the debug prints in phylink extensively as part of
>> >>> debugging :)
>> >>>
>> >>> In this case, I added a debug statement to phylink_resolve printing out
>> >>> cur_link_state, link_state.link, and pl->phy_state.link. I could see that
>> >>> the phy link state was up and the mac (pcs) state was down. By inspecting
>> >>> the PCS's registers, I determined that this was because AN had not completed
>> >>> (in particular, the link was up in BMSR). I believe that forcing in-band-status
>> >>> (by setting ovr_an_inband) shouldn't be necessary, but I was unable to get a link
>> >>> up on any interface without it. In particular, the pre-phylink implementation
>> >>> disabled PCS AN only for fixed links (which you can see in patch 23).
>> >>
>> >> I notice that prior to patch 23, the advertisment register was set to
>> >> 0x4001, but in phylink_mii_c22_pcs_encode_advertisement() we set it to
>> >> 0x0001 (bit 14 being the acknowledge bit from the PCS to the PHY, which
>> >> is normally managed by hardware.
>> >>
>> >> It may be worth testing whether setting bit 14 changes the behaviour.
>> > 
>> > Thanks for the tip. I'll try that out on Monday.
>> 
>> Well, I was playing around with this some more, and I found that I could enable
>> it if I set one of the 10G lanes to SGMII. Not sure what's going on there. It's
>> possible one of the lanes is mismatched, but I'm still looking into it.
>> 
>> ---
>> 
>> How is rate adaptation in the phy supposed to work? One of the 10G interfaces on
>> the RDB is hooked up to an AQR113 which can adapt rates below 10G to XFI using
>> pause frames.
> 
> Rate adaption support isn't something that phylink officially supports.
> It can be bodged around (and some drivers do) but that's the official
> line - there is no code in phylink to make it work.

Ah, that's unfortunate. I removed some Aquantia-specific code, but I may
have to add it back...

> For example, if you have a PHY that's doing rate adaption, and the PHY
> reports what has been negotiated on the media side. That gets reported
> back to the PCS and MAC. The only way these blocks can tell that there's
> something going on is if they say "hey, but the link to the PHY is
> operating at 10G and the media speed is 100M, so something fishy is
> going on, the PHY must be doing rate adaption."
> 
> That's the bottom line to this - phylink doesn't yet support rate
> adaption by any of the blocks - mainly because there is no way for
> any of those blocks to indicate that they're doing rate adaption.
> 
> The implementation of phylink_generic_validate() assumes there is no
> rate adaption (as per the current design of phylink).
> 
> The reason phylink_generic_validate() has come into existence recently
> is to (1) get rid of the numerous almost identical but buggy
> implementations of the validate() method, and (2) to eventually allow
> me to get rid of the validate() method. The validate() method is not
> very well suited to systems with rate adaption - as validate() stands,
> every MAC today that _could_ be connected with something that does rate
> adaption needs to have special handling in that method when that is
> true - that clearly isn't a good idea when it's dependent on the
> properties of the devices towards the media from the MAC.
> 
> Ocelot does make rate adaption work by doing exactly this - it has its
> own validate() method that returns all the link modes that it wishes
> the system to support, and it ignores some of what phylink communicates
> via the link_up() callbacks such as rx_pause. This means this MAC driver
> wouldn't behave correctly as a system if rate adaption wasn't present.
> 
> 
> Now, the thing about rate adaption is there are several different ways
> to do it - and Marvell 88x3310 illustrates them both, because this PHY
> supports rate adaption but depending on whether the PHY has MACSEC
> hardware support or not depende on its behaviour:
> 
> - If no MACSEC, then the PHY requires that the MAC paces the rate at
>   which packets are sent, otherwise the PHYs FIFOs will overflow.
>   Therefore, the MAC must know: (1) the speed of the media side, and
>   (2) that the PHY requires this behaviour. Marvell even go as far as
>   stating that the way to achieve this is to extend the IPG in the MACs
>   settings.
> 
> - If MACSEC, then the PHY sends pause frames back to the MAC to rate
>   limit the packet rate from the MAC. Therefore, the MAC must accept
>   pause frames to throttle the transmit rate whether or not pause
>   frames were negotiated on the media side.
> 
> So, doing this right, you need knowledge of the rate adaption
> implementation - there isn't a "generic" solution to this. It isn't
> just a case of "allow all speeds at the media side at or below PHY
> interface speed" although that is part of it. (More on this below.)



>> This is nice and all, but the problem is that phylink_get_linkmodes
>> sees that we're using PHY_INTERFACE_MODE_10GKR and doesn't add any
>> of the lower link speeds (just MAC_10000).
> 
> Do you really have a 10GBASE-KR link - a backplane link? This has
> negotiation embedded in it. Or do you have a link that is using the
> 10GBASE-R protocol? (Please don't use PHY_INTERFACE_MODE_10GKR unless
> you really have a 10GBASE-KR link as defined by 802.3).

 Sorry, that was a typo.

>> This results in ethtool output of
>> 
>> Settings for eth6:
>> 	Supported ports: [  ]
>> 	Supported link modes:   10000baseT/Full
>> 	                        10000baseKX4/Full
>> 	                        10000baseKR/Full
>> 	Supported pause frame use: Symmetric Receive-only
>> 	Supports auto-negotiation: Yes
>> 	Supported FEC modes: Not reported
>> 	Advertised link modes:  10000baseT/Full
>> 	                        10000baseKX4/Full
>> 	                        10000baseKR/Full
>> 	Advertised pause frame use: Symmetric Receive-only
>> 	Advertised auto-negotiation: Yes
>> 	Advertised FEC modes: Not reported
>> 	Link partner advertised link modes:  10baseT/Half 10baseT/Full
>> 	                                     100baseT/Half 100baseT/Full
>> 	Link partner advertised pause frame use: Symmetric
>> 	Link partner advertised auto-negotiation: Yes
>> 	Link partner advertised FEC modes: Not reported
>> 	Speed: Unknown!
>> 	Duplex: Unknown! (255)
>> 	Auto-negotiation: on
>> 	Port: MII
>> 	PHYAD: 0
>> 	Transceiver: external
>>         Current message level: 0x00002037 (8247)
>>                                drv probe link ifdown ifup hw
>> 	Link detected: yes
>> 
>> The speed and duplex are "Unknown!" because the negotiated link mode (100Base-TX)
>> doesn't intersect with the advertised link modes (10000Base-T etc). This is
>> currently using genphy; does there need to be driver support for this sort of thing?
> 
> Without knowing whether this is a clause 22 or clause 45 PHY, I'd just
> be guessing, but...

It's a c45 phy.

> genphy's C45 support is rudimentary and should not be used.

Fun. TIL

> genphy's C22 support is better for basic control of PHYs but should not
> be used if there's a more specific driver.
> 
> If this is a C22 PHY, I'm surprised that it managed to link with its
> partner - we should have cleared anything but the 10000M modes in the
> PHY which should have caused the media side autonegotiation to fail.

I looked into this further, and it seems like the phy has an "Autonegotiation
Vendor Provisioning" register. In this register there is a "User Provided
Autonegotiation Data" field which defaults to 0. This causes "the PHY [to]
construct the correct autonegotiation words based on the provisioned values."
AKA the programmed advertisement is ignored. This is very convenient for me
because otherwise the link would not have come up :)

> However, with the Ocelot-style workaround I mentioned above, that would
> allow the 100M speeds to be advertised, and phylib would then be able
> to resolve them to the appropriate speed/duplex. I don't condone doing
> that though, I'd prefer a proper solution to this problem.
> 
>> Should the correct speed even be reported here? The MAC and PCS still need to be
>> configured for XFI.
>> 
>> Another problem is that the rate adaptation is supposed to happen with pause frames.
>> Unfortunately, pause frames are disabled:
>> 
>> Pause parameters for eth6:
>> Autonegotiate:	on
>> RX:		off
>> TX:		off
>> RX negotiated: on
>> TX negotiated: on
> 
> I think you're misreading that - don't worry, I don't think you're the
> only one.
> 
> "Autonegotiate" is the value of ethtool's pauseparam "autoneg" setting
> which determines whether the resutl of autonegotiation is used or
> whether manual settings are used.
> 
> "RX" and "TX" are the manual settings, which will force pause frame
> reception and transmission gating when "Autonegotiate" is off. These
> can be read-modify-written (and are by ethtool) so it's important
> that they return what was previously configured, not what the hardware
> is doing. See do_spause() in the ethtool source code.

Ah, I missed that these were from link_config and not link_status.

> "RX negotiated" and "TX negotiated" are ethtool's own derivation from
> our and link-partner advertisements and in no way reflect what the
> hardware is actually doing. These reflect what was negotiated on the
> media between the PHYs.
> 
> See dump_pause() in the ethtool source code for the function that
> produces the output you quoted above.
> 
> Phylink's "Link is Up" message gives the details for the link - the
> speed and duplex will be the media side of the link (which is what gets
> passed in all the link_up() methods). The pause settings come from the
> media side negotiation if pause autoneg is enabled, otherwise they come
> from the pauseparam forced modes. I think this should only ever report
> the media-negotiated settings.

[    8.029403] fsl_dpaa_mac 1af0000.ethernet eth6: Link is Up - 10Gbps/Full - flow control off

So the pause parameters are still off.

> If we need support for rate adaption with pause frames, then you are
> right that we need the MAC to be open to receive those frames, and
> right now, as I said above, there is no support in phylink at present
> to make that happen. I'm not saying there shouldn't be, I'm just saying
> that's how it is today.
> 
> In order to do this, we would need to have some way of knowing that:
> (a) the PHY is a rate adapting PHY (which means it must not use genphy.)
> (b) the PHY is will send pause frames towards the MAC to pace it.
> 
> This would need to be added to phylib, and then phylink can query
> phylib for that information and, when telling the MAC that the link
> is up, also enable rx_pause.
> 
> The same is true at the PCS level - we don't have any way to know if
> a PCS is doing rate adaption, so until we have a way to know that,
> phylink can't enable rx_pause.
> 
> There is one final issue that needs to be considered - what if the
> PHY is a rate adapting PHY which sends pause frames, but it has been
> coupled with a MAC that doesn't have support to act on those pause
> frames? Do we print a warning? Do we refuse to bring the link up?
> Do we fall back to requiring the MAC to increase the IPG? What if the
> MAC isn't capable of increasing the IPG? How do we tell the MAC to
> increase the IPG, another flag in its link_up() method?

I'd say that failing to bring the link up would probably be the most
reasonable thing to do in most cases. The IPG would need to be increased
to seriously unusual levels for correct adaptation. DPAA can't have an IPG
over 252 bytes, and I imagine that that's a fairly common restriction (though
since you bring it up as an option, there probably exists a MAC supporting the
10 KiB IPGs necessary for 1G over 10G).

>> Maybe this is because phylink_mii_c45_pcs_get_state doesn't check for pause modes?
> 
> With a 10GBASE-R PCS, there is no in-band status on the link, and so
> there is no communication of pause frame negotiation status to the
> PCS - meaning, there is no way to read it from the PCS.

Yup. I tried enabling autonegotiation in the PCS but got nothing.

> Let me be clear about this: this is a shortcoming of phylink, but
> phylink had to start somewhere, and all the hardware I have does not
> support rate adaption.
> 
> I'd like this problem to get solved - I have some experimental patches
> that allow a PCS to say "hey, I'm doing rate adaption, don't bother
> with the MAC validation" but I get the feeling that's not really
> sufficient.
> 
> Anyway, I'm afraid it's very late here, so I can't put any more
> thought into this tonight, but I hope the above is at least helpful
> and gives some ideas what needs to be done to solve this.

I really appreciate it. This has cleared things up for me.

--Sean

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ