lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <843eb3cd-3cc0-0b06-2aa8-709f25308437@free.fr>
Date:   Tue, 25 Jul 2017 00:53:44 +0200
From:   Mason <slash.tmp@...e.fr>
To:     Florian Fainelli <f.fainelli@...il.com>,
        Andrew Lunn <andrew@...n.ch>, Mans Rullgard <mans@...sr.com>
Cc:     netdev <netdev@...r.kernel.org>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>
Subject: Re: Problem with PHY state machine when using interrupts

On 25/07/2017 00:36, Florian Fainelli wrote:
> On 07/24/2017 02:20 PM, Mason wrote:
>> On 24/07/2017 21:53, Florian Fainelli wrote:
>>
>>> Well now that I see the possible interrupts generated, I indeed don't
>>> see how you can get a link down notification unless you somehow force
>>> the link down yourself, which would certainly happen in phy_suspend()
>>> when we set BMCR.pwrdwn, but that may be too late.
>>>
>>> You should still expect the adjust_link() function to be called though
>>> with PHY_HALTED being set and that takes care of doing phydev->link = 0
>>> and netif_carrier_off(). If that still does not work, then see whether
>>> removing the call to phy_stop() does help (it really should).
>>
>> The only functions setting phydev->state to PHY_HALTED
>> are phy_error() and phy_stop() AFAICT.
>>
>> I am aware that when phy_state_machine() handles the
>> PHY_HALTED state, it will set phydev->link = 0;
>> and call netif_carrier_off() -- because that's where
>> I copied that code from.
>>
>> My issue is that phy_state_machine() does not run when
>> I run 'ip set link dev eth0 down' from the command line.
> 
> Yes, that much is clear, which is why I suggested earlier you try the
> patch at the end now.
> 
>>
>> If I'm reading the code right, phy_disconnect() actually
>> stops the state machine.
>>
>> In interrupt mode, phy_state_machine() doesn't run
>> because no interrupt is generated.
>>
>> In polling mode, phy_state_machine() doesn't run
>> because phy_disconnect() stops the state machine.
>>
>> Introducing a sleep before phy_disconnect() gives
>> the state machine a chance to run in polling mode,
>> but it doesn't feel right, and doesn't fix the
>> other mode, which I'm using.
> 
> There are several problems it seems:
> 
> - the PHY state machine cannot move solely based on the PHY generating
> interrupts during phy_stop() if none of the changing conditions for the
> HW have changed, come to think about it, I doubt any PHY would be
> capable of doing something like that
> 
> - there is an expectation from your driver to have adjust_link() run
> sometime during ndo_stop() to do something, but why?
> 
> What is special about nb8800 that interrupts should be generated during
> ndo_stop(), and why do you think this is going to solve your problem?
> 
>>
>> Looking at bcm_enet_stop() it calls phy_stop() and
>> phy_disconnect() just like the nb8800 driver...
>>
>> I'm stumped.
> 
> My suggestion of not using phy_stop() was not a good one, but
> functionally there is little difference in doing phy_stop() +
> phy_disconnect() or just phy_disconnect() alone. What matters is that we
> restart the PHY properly when phy_connect() or phy_start() is called.
> 
> What I understand now from your "bug report" is that you want
> adjust_link to run with phydev->link = 0 to do something during
> ndo_close() but you have not explained why, but I suspect such that upon
> ndo_open() things work again.
> 
> What I suspect your bug is, is that the really was no change in link
> status, no interrupt was generated because there should not be one, yet
> some of what nb8800_stop() does is not properly balanced by
> nb8800_open(). How about the following patch:
> 
> diff --git a/drivers/net/ethernet/aurora/nb8800.c
> b/drivers/net/ethernet/aurora/nb8800.c
> index 041cfb7952f8..b07dea3ab019 100644
> --- a/drivers/net/ethernet/aurora/nb8800.c
> +++ b/drivers/net/ethernet/aurora/nb8800.c
> @@ -972,6 +972,10 @@ static int nb8800_open(struct net_device *dev)
>         nb8800_mac_rx(dev, true);
>         nb8800_mac_tx(dev, true);
> 
> +       priv->speed = -1;
> +       priv->link = -1;
> +       priv->duplex = -1;
> +
>         phydev = of_phy_connect(dev, priv->phy_node,
>                                 nb8800_link_reconfigure, 0,
>                                 priv->phy_mode);
> 

I will test your two patches ASAP.

For the record, I have two (maybe related) issues:

1) After a sequence of
ip set link dev eth0 up
ip set link dev eth0 down
ip set link dev eth0 up
RX engine is hosed, thus network connectivity is borked.
The work-around is resetting the MAC in ndo_open
=> mac_init in my proposed patch.
This is by far the biggest issue.
Also, resetting the MAC in ndo_open will make it easy
to implement suspend/resume, as I can just ndo_stop
in suspend, and ndo_open in resume.

2) The system does not print "Link down" when I run
'ip set link dev eth0 down'
This is a symptom of nb8800_link_reconfigure()
not being called at all (or being called
with phydev->link == priv->link when I tried
skipping phy_stop)

Again, thanks for your patches and suggestions,
which I will test in the morning.

I will also try to understand why I didn't have
these issues on the other board...

Regards.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ