linux-kernel - Re: [PATCH net v3 2/2] net: fec: Suspend the PHY on probe

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fd57c135-d4bb-4d73-badc-bd91810d045c@actia.se>
Date: Thu, 28 Mar 2024 11:52:23 +0000
From: John Ernberg <john.ernberg@...ia.se>
To: Florian Fainelli <f.fainelli@...il.com>, "Russell King (Oracle)"
	<linux@...linux.org.uk>
CC: Maxime Chevallier <maxime.chevallier@...tlin.com>, Wei Fang
	<wei.fang@....com>, Shenwei Wang <shenwei.wang@....com>, Clark Wang
	<xiaoning.wang@....com>, NXP Linux Team <linux-imx@....com>, "David S.
 Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, "Jakub
 Kicinski" <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Heiner Kallweit
	<hkallweit1@...il.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Andrew Lunn
	<andrew@...n.ch>
Subject: Re: [PATCH net v3 2/2] net: fec: Suspend the PHY on probe

Hi Florian,

On 3/25/24 10:27 PM, Florian Fainelli wrote:
> On 3/25/24 05:20, John Ernberg wrote:
>> Hi Florian,
>>
>> On 3/21/24 17:13, Florian Fainelli wrote:
>>> On 3/21/24 09:02, John Ernberg wrote:
>>>> Hi Russell,
>>>>
>>>> On 3/20/24 20:44, Russell King (Oracle) wrote:
>>>>> On Wed, Mar 20, 2024 at 10:13:55AM -0700, Florian Fainelli wrote:
>>>>>>
>>>>>>
>>>>>> On 3/20/2024 9:54 AM, Russell King (Oracle) wrote:
>>>>>>> On Wed, Mar 20, 2024 at 03:25:54PM +0000, John Ernberg wrote:
>>>>>>>> Hi Russel,
>>>>>>>
>>>>>>> Growl. Hi Peter.
>>>>>>>
>>>>>>>> What we really want is the PHY to be suspended on suspend to RAM
>>>>>>>> regardless of us having had an initial link up or not.
>>>>>>>
>>>>>>> So what you're asking is for the PHY to be suspended when the system
>>>>>>> is entering suspend, which is a long time after the system booted 
>>>>>>> and
>>>>>>> thus phy_probe() was called, and could be some time before the 
>>>>>>> system
>>>>>>> resumes.
>>>>>>>
>>>>>>> I'm not sure what the relevance is of phy_probe() that was 
>>>>>>> brought up
>>>>>>> previously then.
>>>>>>>
>>>>>>>> This worked prior to 4c0d2e96ba05 ("net: phy: consider that
>>>>>>>> suspend2ram
>>>>>>>> may cut
>>>>>>>> off PHY power") which was added in Linux 5.11, and 557d5dc83f68
>>>>>>>> ("net:
>>>>>>>> fec: use
>>>>>>>> mac-managed PHY PM") which was added in Linux 5.12.
>>>>>>>
>>>>>>> Looking at the former commit, that looks to me like it is only
>>>>>>> affecting the resume paths, not the suspend paths, so wouldn't have
>>>>>>> any impact itself on what happens when suspend happens.
>>>>>>>
>>>>>>> The latter commit states that it is a work around for an issue 
>>>>>>> with a
>>>>>>> particular PHY. What happens if you revert just this commit, does 
>>>>>>> your
>>>>>>> problem then go away?
>>>>
>>>> Our PHY does not begin working again without reverting both.
>>>> phy_init_hw()
>>>> will remain an issue if it occurs after phy_start().
>>>>
>>>> The commit message in 557d5dc83f68 is not explaining nearly enough, I
>>>> spent a
>>>> few days on it before I proved that commit to be nearly correct (See
>>>> whole
>>>> thread at [1]), it happened to just explode with that PHY. The issue 
>>>> is a
>>>> sequencing issue that was made more prominent by 4c0d2e96ba05, but it
>>>> existed
>>>> since around 2008. Because FEC is both MDIO controller and MAC,
>>>> meaning the
>>>> resume of the link in a link up case runs phy_start() in the FEC resume
>>>> function, which will trigger a mdio bus resume when it completes, in 
>>>> turn
>>>> calling phy_init_hw() (before 4c0d2e96ba05 it was phy_resume() which
>>>> wasn't a
>>>> problem but still wrong sequence wise).
>>>>
>>>>>>>
>>>>>>> Also, please clarify. It seems that you are reporting a regression -
>>>>>>> it used to work for you prior to 557d5dc83f68, but 557d5dc83f68 
>>>>>>> stops
>>>>>>> it working for you?
>>>>>>>
>>>>>>>> Since FEC requires mac_managed_pm the generic PM suspend-resume 
>>>>>>>> paths
>>>>>>>> are not
>>>>>>>> taken. The resume sequencing with generic PM has been broken 
>>>>>>>> with the
>>>>>>>> FEC since
>>>>>>>> generic PM of the mdio bus was added, as the FEC will do 
>>>>>>>> phy_start()
>>>>>>>> (via FEC
>>>>>>>> resume) and then generic PM runs phy_init_hw() via mdio bus resume
>>>>>>>> (previously:
>>>>>>>> less damaging phy_resume()) due to how the FEC IP block works.
>>>>>>>
>>>>>>> That suggests that even with 557d5dc83f68 reverted, it's broken.
>>>>>>> Digging into the history, what you're referring to dates from 
>>>>>>> January
>>>>>>> 2016, so are you reporting a regression that occured 8 _years_ ago,
>>>>>>> at which point I'd question why it's taken 8 years.
>>>>
>>>> A revert of those is absolutely wrong. Those commits are fixing bigger
>>>> issues.
>>>>
>>>>>>>
>>>>>>> Given the time that has passed, I don't think reverting commits is
>>>>>>> a sane approach. Quite what the right solution is though, I'm not
>>>>>>> sure.
>>>>>>>
>>>>>>>     From the description and the commits pointed to, I just don't 
>>>>>>> see
>>>>>>> that there is anything that could've changed with respect to the 
>>>>>>> first
>>>>>>> boot - if that has changed, then I think more research into what
>>>>>>> caused
>>>>>>> it is needed.
>>>>>>>
>>>>>>> If it's the subsequent state after a suspend-resume cycle, then yes,
>>>>>>> I would agree that its possible that these changes broke this for 
>>>>>>> you.
>>>>>>> Would clearing ndev->phydev->mac_managed_pm just before
>>>>>>> phy_disconnect() in fec_enet_close() fix it for you, so the suspend/
>>>>>>> resume paths for the PHY get used when the network interface is 
>>>>>>> down?
>>>>>>>
>>>>>>> Maybe, however, that's something that should happen in any case 
>>>>>>> inside
>>>>>>> phylib on phy_disconnect() as a matter of course, since the PHY will
>>>>>>> at that point be no longer under the control of the network 
>>>>>>> driver for
>>>>>>> PM purposes. Could you give this idea a try please?
>>>>>>>
>>>>>>
>>>>>> On phy_disconnect() we will do a phy_detach() which calls
>>>>>> phy_suspend().
>>>>>> Given that phy_disconnect() is called from fec_enet_close(), we
>>>>>> still have a
>>>>>> MDIO bus registered and we are not trying to suspend the MDIO bus,
>>>>>> so we
>>>>>> should have an effective phy_suspend() call here, what am I missing?
>>>>>
>>>>> I didn't look there, but if that is the case, then what is John's
>>>>> problem - I can't figure it out, something isn't adding up here.
>>>>>
>>>>
>>>> I could instead add extra phy_suspend() in the suspend path if the
>>>> link is
>>>> down and the FEC is up and running. I rejected it originally thinking
>>>> it was
>>>> a much dirtier fix, but maybe that is the more correct thing to do?
>>>
>>> This does not seem like the proper solution, the only time where an
>>> explicit phy_suspend() should be done in the Ethernet MAC's ->suspend()
>>> routine is if the network device was brought up at the time
>>> (netif_runninng() returns true) *and* you set mac_managed_pm = true
>>> because you must precisely control the order in which the MAC and the
>>> PHY get suspended with respect to each other (typically because the PHY
>>> supplies a RX clock back to the MAC, and some of the MAC logic depends
>>> upon it to operate properly, e.g.: perform a proper FIFO flush etc.).
>>
>> I'm having some trouble understanding your message in context of my most
>> recent reply to Russell, so please bear with me here as I will
>> potentially ask a really dumb question:
>>
>> Do I understand this correctly as what used to work in 5.10 was never
>> meant to work and the behavior now is the correct one in the FEC case?
> 
> I am not sure about that, because I do not believe this had been 
> explicitly accounted for, and that is what I am also trying to figure out.
> 
>> Meaning that if the link has never been up the PHY must never be handled
>> from a power management perspective?
> 
> Let's stray away from saying "link is UP" but instead say interface 
> administratively brought up (ip link set dev <foo> up) so we are on the 
> same page.
> 
>>
>> The only PHY examples I have come across (though not many in total) the
>> PHY has done some initial configuration of itself after POR or release
>> of the reset line.
>>
>>>
>>>   From there, I see two distinct cases:
>>>
>>> - the network device driver probed, but the network device was never
>>> brought up in the first place in that case, I do not see a path whereby
>>> the PHY would have been suspended, unless the boot firmware already took
>>> care of that (which arguably it should if you are trying to be as power
>>> efficient as possible), although arguably there could be a path within
>>> the kernel where this is also done. It could get really complicated 
>>> however
>>
>> Generic PM via mdio_bus_phy_suspend() will suspend the PHY if it has a
>> .suspend callback and mac_managed_pm isn't set.
> 
> Correct.
> 
>>
>> mdio_bus_phy_may_suspend() will see that netdev is NULL, which means it
>> returns the inverse of phy->suspended (which is false), meaning the
>> function returns true. Thus phy_suspend() is called.
> 
> OK, so right there is where you were were basically depending upon 
> mdio_bus_phy_may_suspend() being called to suspend your PHY device! I 
> don't think this had ever been envisioned to be working that way, if it 
> did that was a byproduct rather than a contract. Introducing 
> mac_managed_pm certainly did break that contract because now we no 
> longer have that "double" suspend and resume that used to happen.

I think I understand you know, thanks for clearing that up. Based on 
this, even if the missing suspend in cases where the interface was never 
administrative up is technically a regression it sounds like a much more 
robust fix to our problem would be to do what I considered a work-around 
before (using "ip link set <dev> up && ip link set <dev> down" at boot) 
would likely be the most correct way to go about it.

I will be dropping this patch from the set in the next iteration.

> 
>>
>>>
>>> - the network device driver probed, and the network device was brought
>>> up at least once (regardless of whether a link state was detected or
>>> not), such that the PHY has gone through a phy_start()/phy_stop() cycle,
>>> and upon phy_stop() a phy_suspend() has been called
>>>
>>> It is safe to assume you fall in the first case only, or do you also see
>>> a problem in the second case as well?
>>
>> There is only a problem in the first case. The second case is working as
>> expected.
> 
> Thanks for the clarification.

Thanks! // John Ernberg