lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171201172440.GK10595@n2100.armlinux.org.uk>
Date:   Fri, 1 Dec 2017 17:24:40 +0000
From:   Russell King - ARM Linux <linux@...linux.org.uk>
To:     Grygorii Strashko <grygorii.strashko@...com>
Cc:     Yan Markman <ymarkman@...vell.com>,
        Antoine Tenart <antoine.tenart@...e-electrons.com>,
        "andrew@...n.ch" <andrew@...n.ch>,
        "f.fainelli@...il.com" <f.fainelli@...il.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "gregory.clement@...e-electrons.com" 
        <gregory.clement@...e-electrons.com>,
        "thomas.petazzoni@...e-electrons.com" 
        <thomas.petazzoni@...e-electrons.com>,
        "miquel.raynal@...e-electrons.com" <miquel.raynal@...e-electrons.com>,
        Nadav Haklai <nadavh@...vell.com>,
        "mw@...ihalf.com" <mw@...ihalf.com>,
        Stefan Chulski <stefanc@...vell.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect

On Fri, Dec 01, 2017 at 11:07:22AM -0600, Grygorii Strashko wrote:
> Hi Russell,
> 
> On 11/30/2017 07:28 AM, Russell King - ARM Linux wrote:
> > On Thu, Nov 30, 2017 at 10:10:18AM +0000, Russell King - ARM Linux wrote:
> >> On Thu, Nov 30, 2017 at 08:51:21AM +0000, Yan Markman wrote:
> >>> The phylink_stop is called before phylink_disconnect_phy
> >>> You could see in mvpp2.c:
> >>>
> >>> mvpp2_stop_dev() {
> >>> 	phylink_stop(port->phylink);
> >>> }
> >>>
> >>> mvpp2_stop()       {
> >>> 	mvpp2_stop_dev(port);
> >>> 	phylink_disconnect_phy(port->phylink);
> >>> }
> >>>
> >>> .ndo_stop = mvpp2_stop,
> >>
> >> Sorry, I don't have this in mvpp2.c, so I have no visibility of what
> >> you're working with.
> >>
> >> What you have above looks correct, and I see no reason why the p21
> >> patch would not have resolved your issue.  The p21 patch ensures
> >> that phylink_resolve() gets called and completes before phylink_stop()
> >> returns.  In that case, phylink_resolve() will call the mac_link_down()
> >> method if the link is not already down.  It will also print the "Link
> >> is Down" message.
> >>
> >> Florian has already tested this patch after encountering a similar
> >> issue, and has reported that it solves the problem for him.  I've also
> >> tested it with mvneta, and the original mvpp2x driver on Macchiatobin.
> >>
> >> Maybe there's something different about mvpp2, but as I have no
> >> visibility of that driver and the modifications therein, I can't
> >> comment further other than stating that it works for three different
> >> implementations.
> >>
> >> Maybe you could try and work out what's going on with the p21 patch
> >> in your case?
> > 
> > I think I now realise what's probably going on.
> > 
> > If you call netif_carrier_off() before phylink_stop(), then phylink will
> > believe that the link is already down, and so it won't bother calling
> > mac_link_down() - it will believe that the link is already down.
> > 
> > I'll update the documentation for phylink_stop() to spell out this
> > aspect.
> > 
> 
> There are pretty high number of net drivers which do call
> 	netif_carrier_off(dev);
> before
> 	phy_stop(dev->phydev);
> in .ndo_stop() callback.
> 
> As per you comment this seems to be incorrect, so should such calls be
> removed?

Well, I think the question that needs to be asked is this:

  Is calling netif_carrier_off() before phy_stop() safe?

Well, reading the phylib code, this is the answer I've come to:

  Between phy_start() and phy_stop(), phylib is free to manage the
  carrier state itself through the phylib state machine.

  This means if you call netif_carrier_off() prior to phy_stop(),
  there is nothing preventing the phylib state machine from running,
  and a co-incident poll of the PHY could notice that the link has
  come up, and re-enable the carrier while your ndo_stop() method
  is still running.

So, my conclusion is that this practice is provably racy, though
it's probably not that easy to trigger the race (which is probably
why no one has reported the problem.)

Given that it's racy, it's not something that I think phylink should
care about, and should "softly" discourage it.  So, I'm happy with
what phylink is doing here, and I suggest fixing the drivers for
this race.

In any case, it should result in less code in the drivers - since
the work you need to do when the link goes down is a subset of the
work you need to do when the network interface is taken down.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ