netdev - Re: handling of phy_stop() and phy_stop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 7 Feb 2018 15:03:17 -0800
From:   Florian Fainelli <f.fainelli@...il.com>
To:     Russell King - ARM Linux <linux@...linux.org.uk>,
        Heiner Kallweit <hkallweit1@...il.com>
Cc:     Andrew Lunn <andrew@...n.ch>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: handling of phy_stop() and phy_stop_machine() in phylib

On 02/07/2018 01:13 PM, Russell King - ARM Linux wrote:
> On Wed, Feb 07, 2018 at 09:56:37PM +0100, Heiner Kallweit wrote:
>> Am 04.02.2018 um 03:48 schrieb Florian Fainelli:
>>>
>>>
>>> On 02/03/2018 03:58 PM, Heiner Kallweit wrote:
>>>> Am 03.02.2018 um 21:17 schrieb Andrew Lunn:
>>>>> On Sat, Feb 03, 2018 at 05:41:54PM +0100, Heiner Kallweit wrote:
>>>>>> This commit forces callers of phy_resume() and phy_suspend() to hold
>>>>>> mutex phydev->lock. This was done for calls to phy_resume() and
>>>>>> phy_suspend() in phylib, however there are more callers in network
>>>>>> drivers. I'd assume that these other calls issue a warning now
>>>>>> because of the lock not being held.
>>>>>> So is there something I miss or would this have to be fixed?
>>>>>
>>>>> Hi Heiner
>>>>>
>>>>> This is a good point.
>>>>>
>>>>> Yes, it looks like some fixes are needed. But what exactly?
>>>>>
>>>>> The phy state machine will suspend and resume the phy is you call
>>>>> phy_stop() and phy_start() in the MAC suspend and resume functions.
>>>>>
>>>> AFAICS phy_stop() doesn't suspend the PHY. It just sets the state
>>>> to PHY_HALTED and (at least if we're not in polling mode) doesn't
>>>> call phy_suspend(). Maybe a call to phy_trigger_machine() is
>>>> needed like in phy_start() ? Then the state machine would call
>>>> phy_suspend(), provided the link is still up.
>>>
>>> Right, phy_stop() merely just moves the state machine to PHY_HALTED and
>>> this is actually a great source of problems which I tried to address here:
>>>
>>> https://www.mail-archive.com/netdev@vger.kernel.org/msg196061.html
>>>
>>> because phy_stop() is not a synchronous call, so when it returns the
>>> state machine might still be running (it can take up to a 1 HZ depending
>>> on when you called phy_stop()) and so if you took that as a
>>> synchronization point to e.g: turn of your Ethernet MAC/MDIO bus clocks,
>>> you will likely see problems. phy_stop_machine() would provide that
>>> synchronization point, but is not currently exported, despite being a
>>> global symbol. This patch series above is all well and good, except that
>>> Geert reported issues with suspend/resume interactions which I have not
>>> been able to track down.
>>>
>> To not confuse readers I changed the subject of the mail to reflect that
>> the discussion isn't about the original topic any longer.
>>
>> It seems to me that (at least one) reason for the issues is that pm
>> callbacks for the phy device and the network device interfere.
>> phy device pm (mdio_bus_phy_suspend et al) feels responsible for dealing
>> with suspending/resuming the PHY, and the network driver pm callbacks
>> as well.
>>
>> Maybe, if the network driver takes care, it should inform phy device pm
>> to do nothing. For this we could add a flag to phydev.mdio.flags.
>> If the network driver sets this flag then mdio_bus_phy_suspend()
>> and mdio_bus_phy_resume() could turn into no-ops.
>> Not totally sure yet about mdio_bus_phy_restore() ..
> 
> What if the MDIO bus is handled by a separate device and the MDIO bus
> is suspended prior to the network driver, thereby making the PHY
> inaccessible?

Indeed. We can know that in the PHY library though, because there is
logic to hold the module reference count, see phy_attach_direct(), we
cannot quite trust whether the Ethernet controller does the right thing
though, as I can think of several ways for things to be done wrong like:
CONFIG_PM is enabled, but the Ethernet driver does not implement any
suspend/resume callback, or does it wrong etc...
-- 
Florian