netdev - Re: handling of phy_stop() and phy_stop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ab8f592f-5444-a5ab-e6e2-16f566fbe681@gmail.com>
Date:   Wed, 7 Feb 2018 21:56:37 +0100
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     Florian Fainelli <f.fainelli@...il.com>,
        Andrew Lunn <andrew@...n.ch>
Cc:     Russell King <rmk+kernel@...linux.org.uk>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: handling of phy_stop() and phy_stop_machine() in phylib

Am 04.02.2018 um 03:48 schrieb Florian Fainelli:
> 
> 
> On 02/03/2018 03:58 PM, Heiner Kallweit wrote:
>> Am 03.02.2018 um 21:17 schrieb Andrew Lunn:
>>> On Sat, Feb 03, 2018 at 05:41:54PM +0100, Heiner Kallweit wrote:
>>>> This commit forces callers of phy_resume() and phy_suspend() to hold
>>>> mutex phydev->lock. This was done for calls to phy_resume() and
>>>> phy_suspend() in phylib, however there are more callers in network
>>>> drivers. I'd assume that these other calls issue a warning now
>>>> because of the lock not being held.
>>>> So is there something I miss or would this have to be fixed?
>>>
>>> Hi Heiner
>>>
>>> This is a good point.
>>>
>>> Yes, it looks like some fixes are needed. But what exactly?
>>>
>>> The phy state machine will suspend and resume the phy is you call
>>> phy_stop() and phy_start() in the MAC suspend and resume functions.
>>>
>> AFAICS phy_stop() doesn't suspend the PHY. It just sets the state
>> to PHY_HALTED and (at least if we're not in polling mode) doesn't
>> call phy_suspend(). Maybe a call to phy_trigger_machine() is
>> needed like in phy_start() ? Then the state machine would call
>> phy_suspend(), provided the link is still up.
> 
> Right, phy_stop() merely just moves the state machine to PHY_HALTED and
> this is actually a great source of problems which I tried to address here:
> 
> https://www.mail-archive.com/netdev@vger.kernel.org/msg196061.html
> 
> because phy_stop() is not a synchronous call, so when it returns the
> state machine might still be running (it can take up to a 1 HZ depending
> on when you called phy_stop()) and so if you took that as a
> synchronization point to e.g: turn of your Ethernet MAC/MDIO bus clocks,
> you will likely see problems. phy_stop_machine() would provide that
> synchronization point, but is not currently exported, despite being a
> global symbol. This patch series above is all well and good, except that
> Geert reported issues with suspend/resume interactions which I have not
> been able to track down.
> 
To not confuse readers I changed the subject of the mail to reflect that
the discussion isn't about the original topic any longer.

It seems to me that (at least one) reason for the issues is that pm
callbacks for the phy device and the network device interfere.
phy device pm (mdio_bus_phy_suspend et al) feels responsible for dealing
with suspending/resuming the PHY, and the network driver pm callbacks
as well.

Maybe, if the network driver takes care, it should inform phy device pm
to do nothing. For this we could add a flag to phydev.mdio.flags.
If the network driver sets this flag then mdio_bus_phy_suspend()
and mdio_bus_phy_resume() could turn into no-ops.
Not totally sure yet about mdio_bus_phy_restore() ..


> We should most definitively try to consolidate the different PHY
> suspend/resume within the Ethernet MAC suspend/resume implementation and
> document exactly what the appropriate behavior must be under the
> following circumstances:
> 
> - when to call phy_stop() + phy_stop_machine()
> - when to call phy_suspend() (if the network interface does do not WoL)
> - when to call phy_resume() (if needed, actually, it usually is not)
> - when to call phy_start()
> 
> I don't unfortunately have the time to code this myself at the moment,
> but I will happily review patches if you have the opportunity to do so.
> 
>>
>> However, if the link is down already (due to whatever calls
>> around phy_stop() in the driver) then phy_suspend() wouldn't be
>> called.
> 
> Correct, there is an implicit assumption that when the link is down,
> there is an opportunity for the Ethernet MAC driver to put things in low
> power, and the PHY itself, should be in a lower power mode where only
> link/energy detection might be utilizing power. At least this is the theory.
> 
>>
>> Heiner
>>
>>> A few examples:
>>>
>>> tc35815_suspend(), ravb_suspend() via ravb_close(), sh_eth_suspend()
>>> via sh_eth_close(), fec_suspend(), mpc52xx_fec_of_suspend() via
>>> mpc52xx_fec_close(), ucc_geth_suspend(), etc...
>>>
>>> So i suspect those drivers which call phy_suspend()/phy_resume()
>>> should really be modified to call phy_stop()/phy_start().
>>>
>>> hns_nic_config_phy_loopback() is just funky, and probably needs the
>>> help of the hns guys to fix.
>>>
>>> dsa_slave_suspend() already does a phy_stop(), so the phy_suspend()
>>> can be removed.
>>>
>>> The comments in lpc_eth_open() suggest the phy_resume() is needed, so
>>> locks should be added. socfpga_dwmac_resume() seems to be the same.
>>>
>>>     Andrew
>>>
>>
>