lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4b67f2ac-3fe0-9567-404b-1e58b0fc5eaf@gmail.com>
Date:   Sat, 3 Feb 2018 18:48:44 -0800
From:   Florian Fainelli <f.fainelli@...il.com>
To:     Heiner Kallweit <hkallweit1@...il.com>,
        Andrew Lunn <andrew@...n.ch>
Cc:     Russell King <rmk+kernel@...linux.org.uk>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: Potential issue with f5e64032a799 "net: phy: fix resume handling"



On 02/03/2018 03:58 PM, Heiner Kallweit wrote:
> Am 03.02.2018 um 21:17 schrieb Andrew Lunn:
>> On Sat, Feb 03, 2018 at 05:41:54PM +0100, Heiner Kallweit wrote:
>>> This commit forces callers of phy_resume() and phy_suspend() to hold
>>> mutex phydev->lock. This was done for calls to phy_resume() and
>>> phy_suspend() in phylib, however there are more callers in network
>>> drivers. I'd assume that these other calls issue a warning now
>>> because of the lock not being held.
>>> So is there something I miss or would this have to be fixed?
>>
>> Hi Heiner
>>
>> This is a good point.
>>
>> Yes, it looks like some fixes are needed. But what exactly?
>>
>> The phy state machine will suspend and resume the phy is you call
>> phy_stop() and phy_start() in the MAC suspend and resume functions.
>>
> AFAICS phy_stop() doesn't suspend the PHY. It just sets the state
> to PHY_HALTED and (at least if we're not in polling mode) doesn't
> call phy_suspend(). Maybe a call to phy_trigger_machine() is
> needed like in phy_start() ? Then the state machine would call
> phy_suspend(), provided the link is still up.

Right, phy_stop() merely just moves the state machine to PHY_HALTED and
this is actually a great source of problems which I tried to address here:

https://www.mail-archive.com/netdev@vger.kernel.org/msg196061.html

because phy_stop() is not a synchronous call, so when it returns the
state machine might still be running (it can take up to a 1 HZ depending
on when you called phy_stop()) and so if you took that as a
synchronization point to e.g: turn of your Ethernet MAC/MDIO bus clocks,
you will likely see problems. phy_stop_machine() would provide that
synchronization point, but is not currently exported, despite being a
global symbol. This patch series above is all well and good, except that
Geert reported issues with suspend/resume interactions which I have not
been able to track down.

We should most definitively try to consolidate the different PHY
suspend/resume within the Ethernet MAC suspend/resume implementation and
document exactly what the appropriate behavior must be under the
following circumstances:

- when to call phy_stop() + phy_stop_machine()
- when to call phy_suspend() (if the network interface does do not WoL)
- when to call phy_resume() (if needed, actually, it usually is not)
- when to call phy_start()

I don't unfortunately have the time to code this myself at the moment,
but I will happily review patches if you have the opportunity to do so.

> 
> However, if the link is down already (due to whatever calls
> around phy_stop() in the driver) then phy_suspend() wouldn't be
> called.

Correct, there is an implicit assumption that when the link is down,
there is an opportunity for the Ethernet MAC driver to put things in low
power, and the PHY itself, should be in a lower power mode where only
link/energy detection might be utilizing power. At least this is the theory.

> 
> Heiner
> 
>> A few examples:
>>
>> tc35815_suspend(), ravb_suspend() via ravb_close(), sh_eth_suspend()
>> via sh_eth_close(), fec_suspend(), mpc52xx_fec_of_suspend() via
>> mpc52xx_fec_close(), ucc_geth_suspend(), etc...
>>
>> So i suspect those drivers which call phy_suspend()/phy_resume()
>> should really be modified to call phy_stop()/phy_start().
>>
>> hns_nic_config_phy_loopback() is just funky, and probably needs the
>> help of the hns guys to fix.
>>
>> dsa_slave_suspend() already does a phy_stop(), so the phy_suspend()
>> can be removed.
>>
>> The comments in lpc_eth_open() suggest the phy_resume() is needed, so
>> locks should be added. socfpga_dwmac_resume() seems to be the same.
>>
>>     Andrew
>>
> 

-- 
Florian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ