netdev - Re: [PATCH net-next] net: phy: print error message in phy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <85c68d40-e539-f619-845e-4bf7ed24dcce@gmail.com>
Date:   Sun, 16 Dec 2018 19:11:44 +0100
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     Florian Fainelli <f.fainelli@...il.com>,
        Andrew Lunn <andrew@...n.ch>,
        David Miller <davem@...emloft.net>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next] net: phy: print error message in phy_error

On 16.12.2018 18:55, Florian Fainelli wrote:
> Le 12/16/18 à 9:16 AM, Heiner Kallweit a écrit :
>> On 16.12.2018 18:02, Florian Fainelli wrote:
>>> Le 12/16/18 à 7:52 AM, Heiner Kallweit a écrit :
>>>> So far phy_error() silently stops the PHY state machine. If the network
>>>> driver doesn't inform about a  MDIO error then the user may wonder why
>>>> his network is down. So let's inform the user.
>>>>
>>>> Signed-off-by: Heiner Kallweit <hkallweit1@...il.com>
>>>
>>> Reviewed-by: Florian Fainelli <f.fainelli@...il.com>
>>>
>>> Kind of similar to the netdev watchdog on a TX timeout, I wonder if we
>>> should not just put a WARN() here to get a complete stack trace to help
>>> debug those events?
>>>
>> AFAICS phy_error() is always (?) caused by a MDIO access error.
>> Does it help us to know in which code path the MDIO error occurred?
> 
> I would say yes because if you have things like nested MDIO bus accesses
> (as can happen with Ethernet switches/DSA) or if runtime PM got in the
> way, you would be able to know that.
> 
Good, convinced. So I'll provide a v2.

>>
>> And maybe the stack trace in case of a tx timeout isn't the best
>> example, at least from my personal experience. I dealt with such cases
>> and the stack trace never helped. The root cause always was in a totally
>> different place (wrong chip tx configuration etc.)
>> But maybe there are other cases where it's useful.
> 
> That is entirely true, debugging TX timeouts is definitively no fun and
> the watchdog does not help at all, other than making users scream and
> shout :)
> 
>>
>>>> ---
>>>>  drivers/net/phy/phy.c | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
>>>> index 890ae1d73..a898fa411 100644
>>>> --- a/drivers/net/phy/phy.c
>>>> +++ b/drivers/net/phy/phy.c
>>>> @@ -739,6 +739,8 @@ static void phy_error(struct phy_device *phydev)
>>>>  	phydev->state = PHY_HALTED;
>>>>  	mutex_unlock(&phydev->lock);
>>>>  
>>>> +	phydev_err(phydev, "stopping PHY state machine due to error\n");
>>>> +
>>>>  	phy_trigger_machine(phydev);
>>>>  }
>>>>  
>>>>
>>>
>>>
>>
> 
>