lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 1 Dec 2017 09:36:42 -0800
From:   Florian Fainelli <f.fainelli@...il.com>
To:     Russell King - ARM Linux <linux@...linux.org.uk>,
        Grygorii Strashko <grygorii.strashko@...com>
Cc:     Yan Markman <ymarkman@...vell.com>,
        Antoine Tenart <antoine.tenart@...e-electrons.com>,
        "andrew@...n.ch" <andrew@...n.ch>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "gregory.clement@...e-electrons.com" 
        <gregory.clement@...e-electrons.com>,
        "thomas.petazzoni@...e-electrons.com" 
        <thomas.petazzoni@...e-electrons.com>,
        "miquel.raynal@...e-electrons.com" <miquel.raynal@...e-electrons.com>,
        Nadav Haklai <nadavh@...vell.com>,
        "mw@...ihalf.com" <mw@...ihalf.com>,
        Stefan Chulski <stefanc@...vell.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect

On 12/01/2017 09:24 AM, Russell King - ARM Linux wrote:
> On Fri, Dec 01, 2017 at 11:07:22AM -0600, Grygorii Strashko wrote:
>> Hi Russell,
>>
>> On 11/30/2017 07:28 AM, Russell King - ARM Linux wrote:
>>> On Thu, Nov 30, 2017 at 10:10:18AM +0000, Russell King - ARM Linux wrote:
>>>> On Thu, Nov 30, 2017 at 08:51:21AM +0000, Yan Markman wrote:
>>>>> The phylink_stop is called before phylink_disconnect_phy
>>>>> You could see in mvpp2.c:
>>>>>
>>>>> mvpp2_stop_dev() {
>>>>> 	phylink_stop(port->phylink);
>>>>> }
>>>>>
>>>>> mvpp2_stop()       {
>>>>> 	mvpp2_stop_dev(port);
>>>>> 	phylink_disconnect_phy(port->phylink);
>>>>> }
>>>>>
>>>>> .ndo_stop = mvpp2_stop,
>>>>
>>>> Sorry, I don't have this in mvpp2.c, so I have no visibility of what
>>>> you're working with.
>>>>
>>>> What you have above looks correct, and I see no reason why the p21
>>>> patch would not have resolved your issue.  The p21 patch ensures
>>>> that phylink_resolve() gets called and completes before phylink_stop()
>>>> returns.  In that case, phylink_resolve() will call the mac_link_down()
>>>> method if the link is not already down.  It will also print the "Link
>>>> is Down" message.
>>>>
>>>> Florian has already tested this patch after encountering a similar
>>>> issue, and has reported that it solves the problem for him.  I've also
>>>> tested it with mvneta, and the original mvpp2x driver on Macchiatobin.
>>>>
>>>> Maybe there's something different about mvpp2, but as I have no
>>>> visibility of that driver and the modifications therein, I can't
>>>> comment further other than stating that it works for three different
>>>> implementations.
>>>>
>>>> Maybe you could try and work out what's going on with the p21 patch
>>>> in your case?
>>>
>>> I think I now realise what's probably going on.
>>>
>>> If you call netif_carrier_off() before phylink_stop(), then phylink will
>>> believe that the link is already down, and so it won't bother calling
>>> mac_link_down() - it will believe that the link is already down.
>>>
>>> I'll update the documentation for phylink_stop() to spell out this
>>> aspect.
>>>
>>
>> There are pretty high number of net drivers which do call
>> 	netif_carrier_off(dev);
>> before
>> 	phy_stop(dev->phydev);
>> in .ndo_stop() callback.
>>
>> As per you comment this seems to be incorrect, so should such calls be
>> removed?
> 
> Well, I think the question that needs to be asked is this:
> 
>   Is calling netif_carrier_off() before phy_stop() safe?
> 
> Well, reading the phylib code, this is the answer I've come to:
> 
>   Between phy_start() and phy_stop(), phylib is free to manage the
>   carrier state itself through the phylib state machine.
> 
>   This means if you call netif_carrier_off() prior to phy_stop(),
>   there is nothing preventing the phylib state machine from running,
>   and a co-incident poll of the PHY could notice that the link has
>   come up, and re-enable the carrier while your ndo_stop() method
>   is still running.
> 
> So, my conclusion is that this practice is provably racy, though
> it's probably not that easy to trigger the race (which is probably
> why no one has reported the problem.)
> 
> Given that it's racy, it's not something that I think phylink should
> care about, and should "softly" discourage it.  So, I'm happy with
> what phylink is doing here, and I suggest fixing the drivers for
> this race.
> 
> In any case, it should result in less code in the drivers - since
> the work you need to do when the link goes down is a subset of the
> work you need to do when the network interface is taken down.
> 

While I agree with all of what written before, in practice, calling
netif_carrier_off() when using PHYLIB can cause inconsistent carrier
states at most, but it would not be messing the state machine itself
because PHYLIB does not make uses of netif_carrier_ok() to make any
decisions as whether the link has dropped or not, it bases its
information solely on phydev->link.

This is not true with PHYLINK, which is why the problem was observed here.
-- 
Florian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ