lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 5 Feb 2018 22:48:55 +0100
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     Florian Fainelli <f.fainelli@...il.com>,
        Andrew Lunn <andrew@...n.ch>
Cc:     Russell King <rmk+kernel@...linux.org.uk>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: Potential issue with f5e64032a799 "net: phy: fix resume handling"

Am 04.02.2018 um 03:48 schrieb Florian Fainelli:
> 
> 
> On 02/03/2018 03:58 PM, Heiner Kallweit wrote:
>> Am 03.02.2018 um 21:17 schrieb Andrew Lunn:
>>> On Sat, Feb 03, 2018 at 05:41:54PM +0100, Heiner Kallweit wrote:
>>>> This commit forces callers of phy_resume() and phy_suspend() to hold
>>>> mutex phydev->lock. This was done for calls to phy_resume() and
>>>> phy_suspend() in phylib, however there are more callers in network
>>>> drivers. I'd assume that these other calls issue a warning now
>>>> because of the lock not being held.
>>>> So is there something I miss or would this have to be fixed?
>>>
>>> Hi Heiner
>>>
>>> This is a good point.
>>>
>>> Yes, it looks like some fixes are needed. But what exactly?
>>>
>>> The phy state machine will suspend and resume the phy is you call
>>> phy_stop() and phy_start() in the MAC suspend and resume functions.
>>>
>> AFAICS phy_stop() doesn't suspend the PHY. It just sets the state
>> to PHY_HALTED and (at least if we're not in polling mode) doesn't
>> call phy_suspend(). Maybe a call to phy_trigger_machine() is
>> needed like in phy_start() ? Then the state machine would call
>> phy_suspend(), provided the link is still up.
> 
> Right, phy_stop() merely just moves the state machine to PHY_HALTED and
> this is actually a great source of problems which I tried to address here:
> 
> https://www.mail-archive.com/netdev@vger.kernel.org/msg196061.html
> 
> because phy_stop() is not a synchronous call, so when it returns the
> state machine might still be running (it can take up to a 1 HZ depending
> on when you called phy_stop()) and so if you took that as a
> synchronization point to e.g: turn of your Ethernet MAC/MDIO bus clocks,
> you will likely see problems. phy_stop_machine() would provide that
> synchronization point, but is not currently exported, despite being a
> global symbol. This patch series above is all well and good, except that
> Geert reported issues with suspend/resume interactions which I have not
> been able to track down.
> 
> We should most definitively try to consolidate the different PHY
> suspend/resume within the Ethernet MAC suspend/resume implementation and
> document exactly what the appropriate behavior must be under the
> following circumstances:
> 
> - when to call phy_stop() + phy_stop_machine()
> - when to call phy_suspend() (if the network interface does do not WoL)
> - when to call phy_resume() (if needed, actually, it usually is not)
> - when to call phy_start()
> 

I think phy_start() / phy_start_machine() / phy_start_interrupts()
belong together and we may call the latter two functions from phy_start().
Same for stop.

This would mean:
- Remove call to phy_start_interrupts() from phy_connect_direct()
- Call phy_start_machine() and phy_start_interrupts() from phy_start()
- mdio_bus_phy_suspend() calls phy_stop()
Same for stop, plus: phy_error() calls phy_stop().

In this setup a second call to phy_stop() wouldn't hurt because state
is PHY_HALTED already and phy_stop() is a no-op.

A functional change would be that interrupts are disabled during system
suspend (except WoL because we don't suspend the PHY is this case). 

These are first thoughts and therefore it's fine if you totally disagree ..
I didn't test this yet, it's only a "Gedankenexperiment" so far.

When talking about suspend/resume I think we talked about system suspend /
resume. However I think we need to consider also runtime pm.
If a link is down the network driver may decide to runtime-suspend the PHY
(power it down). In case of runtime pm I'd say we need to keep irq and
workqueue active to be able to react if a cable is plugged in and the PHY
wakes up automatically and establishes a link.

Heiner


> I don't unfortunately have the time to code this myself at the moment,
> but I will happily review patches if you have the opportunity to do so.
> 
>>
>> However, if the link is down already (due to whatever calls
>> around phy_stop() in the driver) then phy_suspend() wouldn't be
>> called.
> 
> Correct, there is an implicit assumption that when the link is down,
> there is an opportunity for the Ethernet MAC driver to put things in low
> power, and the PHY itself, should be in a lower power mode where only
> link/energy detection might be utilizing power. At least this is the theory.
> 
>>
>> Heiner
>>
>>> A few examples:
>>>
>>> tc35815_suspend(), ravb_suspend() via ravb_close(), sh_eth_suspend()
>>> via sh_eth_close(), fec_suspend(), mpc52xx_fec_of_suspend() via
>>> mpc52xx_fec_close(), ucc_geth_suspend(), etc...
>>>
>>> So i suspect those drivers which call phy_suspend()/phy_resume()
>>> should really be modified to call phy_stop()/phy_start().
>>>
>>> hns_nic_config_phy_loopback() is just funky, and probably needs the
>>> help of the hns guys to fix.
>>>
>>> dsa_slave_suspend() already does a phy_stop(), so the phy_suspend()
>>> can be removed.
>>>
>>> The comments in lpc_eth_open() suggest the phy_resume() is needed, so
>>> locks should be added. socfpga_dwmac_resume() seems to be the same.
>>>
>>>     Andrew
>>>
>>
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ