lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <521d30d8-91b5-414f-93bd-19f86bba4aa0@actia.se>
Date: Wed, 28 Feb 2024 07:59:27 +0000
From: John Ernberg <john.ernberg@...ia.se>
To: Wei Fang <wei.fang@....com>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>
CC: Jonas Blixt <jonas.blixt@...ia.se>, Shenwei Wang <shenwei.wang@....com>,
	Clark Wang <xiaoning.wang@....com>, Andrew Lunn <andrew@...n.ch>, "Heiner
 Kallweit" <hkallweit1@...il.com>, Russell King <linux@...linux.org.uk>
Subject: Re: Broken networking on iMX8QXP after suspend after upgrading from
 5.10 to 6.1

Hi Wei,

On 2/19/24 03:25, Wei Fang wrote:
> Hi John,
> 
>> -----Original Message-----
>> From: John Ernberg <john.ernberg@...ia.se>
>> Sent: 2024年2月8日 21:17
>> To: netdev@...r.kernel.org
>> Cc: Jonas Blixt <jonas.blixt@...ia.se>; Wei Fang <wei.fang@....com>;
>> Shenwei Wang <shenwei.wang@....com>; Clark Wang
>> <xiaoning.wang@....com>; Andrew Lunn <andrew@...n.ch>; Heiner
>> Kallweit <hkallweit1@...il.com>; Russell King <linux@...linux.org.uk>
>> Subject: Broken networking on iMX8QXP after suspend after upgrading from
>> 5.10 to 6.1
>>
>> Hi,
>>
>> We just upgraded vendor kernel from 5.10 to 6.1 and ended up with broken
>> networking on our board  unless we bring the PHY up before the first
>> suspend of the system.
>>
>> The link is brought up via external signal, so it is not guaranteed to have been
>> UP before the first system suspend.
>>
>> We'd like to run the mainline kernel but we're not in a position to do so yet.
>> But we hope we can get some advice on this problem anyway.
>>
>> We have a permanently powered Microchip LAN8700R (microchip_t1.c)
>> connected to an iMX8QXP (fec), to be able to wake the system via network if
>> the link is up.
>>
>> This setup was working fine in 5.10.
>>
>> The offending commit as far as we could bisect it is:
>> 557d5dc83f68 ("net: fec: use mac-managed PHY PM") And somewhat:
>> fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages
>> PHY PM")
>>
>> If the interface has not been brought UP before the system suspends we can
>> see this in dmesg:
>>
>>       fec 5b040000.ethernet eth0: MDIO read timeout
>>       Microchip LAN87xx T1 5b040000.ethernet-1:04: PM:
>> dpm_run_callback(): mdio_bus_phy_resume+0x0/0xc8 returns -110
>>       Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: failed to resume:
>> error -110
>>
>> In this state it is impossible to bring the link up before removing all power
>> from the board and then plugging it in again, since the PHY is permanently
>> powered.
>>
>> My understanding here is that since the link has never been UP,
>> fec_enet_open() has never executed, therefor mac_managed_pm is not true.
>> This in turn makes us take the normal PM flow.
>> Likewise in fec_resume() if the interface is not running, the MAC isn't enabled
>> because on the iMX8QXP the FEC is powered down in the suspend path but
>> never re-initialized and enabled in the resume path, so the MAC is powered
>> back up, but still disabled.
>>
>> Adding the following seems to fix the issue, but I personally don't like this,
>> because we just allow the non-mac_managed_pm flow to run longer by
>> enabling the MAC again rather than letting the MAC do the PM as configured
>> in fec_enet_open().
>> What would be the correct thing to do here?
> 
> Sorry for the delayed response.

I must equally apologize for the delayed response.

Your patch helped greatly finding the actual root cause of the problem
(which pre-dates 5.10):

f166f890c8f0 ("net: ethernet: fec: Replace interrupt driven MDIO with 
polled IO")

How 5.10 worked for us is a mystery, because a suspend-resume cycle before
link up writes to MII_DATA register before fec_restart() is called, which
restores the MII_SPEED register, triggering the MII_EVENT quirk.

> Have you tried setting mac_management_pm to true after mdiobus registration?
> Just like below:

I have tested your patch and it does fix my issue, with your patch I also
realized a side-effect of mac_managed_pm in the FEC driver. The PHY will
never suspend due to the current implementation of fec_suspend() and
fec_resume().

phy_suspend() and phy_resume() are never called from FEC code.

May I pick up your patch with a signed-off from you? I would like to make
it a small series adding also suspend/resume of the PHY.

If you want to send it yourself instead, please pick up these tags:
Fixes: 557d5dc83f68 ("net: fec: use mac-managed PHY PM")
Closes: 
https://lore.kernel.org/netdev/1f45bdbe-eab1-4e59-8f24-add177590d27@actia.se/
Reported-by: John Ernberg <john.ernberg@...ia.se>
Tested-by: John Ernberg <john.ernberg@...ia.se>

And then I send a separate patch with yours as a dependency.

Thanks! // John Ernberg

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ