netdev - Re: [PATCH net] net: phy: make phy_error() report which PHY has failed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ac9277e0-7141-a0ce-f516-cd596b00f10d@gmail.com>
Date:   Fri, 20 Dec 2019 23:28:33 +0100
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     Florian Fainelli <f.fainelli@...il.com>,
        Russell King - ARM Linux admin <linux@...linux.org.uk>
Cc:     Andrew Lunn <andrew@...n.ch>,
        "David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org
Subject: Re: [PATCH net] net: phy: make phy_error() report which PHY has
 failed

On 20.12.2019 19:46, Florian Fainelli wrote:
> On 12/19/19 9:06 AM, Russell King - ARM Linux admin wrote:
>> On Thu, Dec 19, 2019 at 08:10:21AM +0100, Heiner Kallweit wrote:
>>> On 18.12.2019 23:09, Russell King - ARM Linux admin wrote:
>>>> On Wed, Dec 18, 2019 at 09:54:32PM +0100, Heiner Kallweit wrote:
>>>>> On 18.12.2019 00:34, Russell King - ARM Linux admin wrote:
>>>>>> On Tue, Dec 17, 2019 at 10:41:34PM +0100, Heiner Kallweit wrote:
>>>>>>> On 17.12.2019 13:53, Russell King wrote:
>>>>>>>> phy_error() is called from phy_interrupt() or phy_state_machine(), and
>>>>>>>> uses WARN_ON() to print a backtrace. The backtrace is not useful when
>>>>>>>> reporting a PHY error.
>>>>>>>>
>>>>>>>> However, a system may contain multiple ethernet PHYs, and phy_error()
>>>>>>>> gives no clue which one caused the problem.
>>>>>>>>
>>>>>>>> Replace WARN_ON() with a call to phydev_err() so that we can see which
>>>>>>>> PHY had an error, and also inform the user that we are halting the PHY.
>>>>>>>>
>>>>>>>> Fixes: fa7b28c11bbf ("net: phy: print stack trace in phy_error")
>>>>>>>> Signed-off-by: Russell King <rmk+kernel@...linux.org.uk>
>>>>>>>> ---
>>>>>>>> There is another related problem in this area. If an error is detected
>>>>>>>> while the PHY is running, phy_error() moves to PHY_HALTED state. If we
>>>>>>>> try to take the network device down, then:
>>>>>>>>
>>>>>>>> void phy_stop(struct phy_device *phydev)
>>>>>>>> {
>>>>>>>>         if (!phy_is_started(phydev)) {
>>>>>>>>                 WARN(1, "called from state %s\n",
>>>>>>>>                      phy_state_to_str(phydev->state));
>>>>>>>>                 return;
>>>>>>>>         }
>>>>>>>>
>>>>>>>> triggers, and we never do any of the phy_stop() cleanup. I'm not sure
>>>>>>>> what the best way to solve this is - introducing a PHY_ERROR state may
>>>>>>>> be a solution, but I think we want some phy_is_started() sites to
>>>>>>>> return true for it and others to return false.
>>>>>>>>
>>>>>>>> Heiner - you introduced the above warning, could you look at improving
>>>>>>>> this case so we don't print a warning and taint the kernel when taking
>>>>>>>> a network device down after phy_error() please?
>>>>>>>>
>>>>>>> I think we need both types of information:
>>>>>>> - the affected PHY device
>>>>>>> - the stack trace to see where the issue was triggered
>>>>>>
>>>>>> Can you please explain why the stack trace is useful.  For the paths
>>>>>> that are reachable, all it tells you is whether it was reached via
>>>>>> the interrupt or the workqueue.
>>>>>>
>>>>>> If it's via the interrupt, the rest of the backtrace beyond that is
>>>>>> irrelevant.  If it's the workqueue, the backtrace doesn't go back
>>>>>> very far, and doesn't tell you what operation triggered it.
>>>>>>
>>>>>> If it's important to see where or why phy_error() was called, there
>>>>>> are much better ways of doing that, notably passing a string into
>>>>>> phy_error() to describe the actual error itself.  That would convey
>>>>>> way more useful information than the backtrace does.
>>>>>>
>>>>>> I have been faced with these backtraces, and they have not been at
>>>>>> all useful for diagnosing the problem.
>>>>>>
>>>>> "The problem" comes in two flavors:
>>>>> 1. The problem that caused the PHY error
>>>>> 2. The problem caused by the PHY error (if we decide to not
>>>>>    always switch to HALTED state)
>>>>>
>>>>> We can't do much for case 1, maybe we could add an errno argument
>>>>> to phy_error(). To facilitate analyzing case 2 we'd need to change
>>>>> code pieces like the following.
>>>>>
>>>>> case a:
>>>>> err = f1();
>>>>> case b:
>>>>> err = f2();
>>>>>
>>>>> if (err)
>>>>> 	phy_error()
>>>>>
>>>>> For my understanding: What caused the PHY error in your case(s)?
>>>>> Which info would have been useful for analyzing the error?
>>>>
>>>> Errors reading/writing from the PHY.
>>>>
>>>> The problem with a backtrace from phy_error() is it doesn't tell you
>>>> where the error actually occurred, it only tells you where the error
>>>> is reported - which is one of two different paths at the moment.
>>>> That can be achieved with much more elegance and simplicity by
>>>> passing a string into phy_error() to describe the call site if that's
>>>> even relevant.
>>>>
>>>> I would say, however, that knowing where the error occurred would be
>>>> far better information.
>>>>
>>> AFAICS PHY errors are typically forwarded MDIO access errors.
>>> PHY driver callback implementations could add own error sources,
>>> but from what I've seen they don't. Instead of the backtrace in
>>> phy_error() we could add a WARN_ONCE() to __mdiobus_read/write.
>>> Then the printed call chain should be more useful.
>>> If somebody wants to analyze in more detail, he can switch on
>>> MDIO access tracing.
>>
>> I'm still not clear why you're so keen to trigger a kernel warning
>> on one of these events.
>>
>> Errors may _legitimately_ occur when trying to read/write a PHY. For
>> example, it would be completely mad for the kernel to WARN and taint
>> itself just because you've unplugged a SFP module just at the time
>> that phylib is trying to poll the PHY on-board, and that caused an
>> failure to read/write the PHY. You just need the right timing to
>> trigger this.
>>

When the trace was added we didn't consider the fact that there may
me legitimate errors. And sure, WARNing just because a SFP has been
removed is inappropriate. Having said that I'm fine with removing the
trace. Then phy_error() should just be extended to also print the
errno of the failed operation and give a hint which operation failed.

>> When a SFP module is unplugged the three contacts that comprise the
>> I2C bus (used for communicating with a PHY that may be there) and
>> the pin that identifies that the module is present all break at about
>> the same point in time (give or take some minor tolerances) so there
>> is no way to definitively say "yes, the PHY is still present, we can
>> talk to it" by testing something.
>>
> 
> For instance, here is a resume failure because of incorrect pinmuxing,
> you can see that the piece of useful information is not from the stack
> trace, but right under:
> 
> [   39.637976] ------------[ cut here ]------------
> [   39.637995] WARNING: CPU: 0 PID: 29 at drivers/net/phy/phy.c:657
> phy_error+0x34/0x6c
> [   39.637998] Modules linked in:
> [   39.638006] CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 5.5.0-rc2 #23
> [   39.638007] Hardware name: Broadcom STB (Flattened Device Tree)
> [   39.638013] Workqueue: events_power_efficient phy_state_machine
> [   39.638015] Backtrace:
> [   39.638021] [<4020df50>] (dump_backtrace) from [<4020e254>]
> (show_stack+0x20/0x24)
> [   39.638023]  r7:41ca90c8 r6:00000000 r5:60000153 r4:41ca90c8
> [   39.638030] [<4020e234>] (show_stack) from [<40b98404>]
> (dump_stack+0xb8/0xe4)
> [   39.638035] [<40b9834c>] (dump_stack) from [<40226220>]
> (__warn+0xec/0x104)
> [   39.638038]  r10:ed7ab605 r9:4080ba38 r8:00000291 r7:00000009
> r6:40da88ec r5:00000000
> [   39.638039]  r4:00000000 r3:189dae12
> [   39.638043] [<40226134>] (__warn) from [<402262f4>]
> (warn_slowpath_fmt+0xbc/0xc4)
> [   39.638045]  r9:00000009 r8:4080ba38 r7:00000291 r6:40da88ec
> r5:00000000 r4:e9232000
> [   39.638049] [<4022623c>] (warn_slowpath_fmt) from [<4080ba38>]
> (phy_error+0x34/0x6c)
> [   39.638051]  r9:41cb14b0 r8:fffffffb r7:e80d3400 r6:00000003
> r5:e80d36dc r4:e80d3400
> [   39.638055] [<4080ba04>] (phy_error) from [<4080ccc0>]
> (phy_state_machine+0x114/0x1c0)
> [   39.638056]  r5:e80d36dc r4:e80d36b0
> [   39.638060] [<4080cbac>] (phy_state_machine) from [<40243ab4>]
> (process_one_work+0x240/0x57c)
> [   39.638063]  r8:00000000 r7:ed7ab600 r6:ed7a8040 r5:e90ff880 r4:e80d36b0
> [   39.638065] [<40243874>] (process_one_work) from [<402442b0>]
> (worker_thread+0x58/0x5f4)
> [   39.638068]  r10:e9232000 r9:ed7a8058 r8:41c03d00 r7:00000008
> r6:e90ff894 r5:ed7a8040
> [   39.638069]  r4:e90ff880
> [   39.638073] [<40244258>] (worker_thread) from [<4024b050>]
> (kthread+0x148/0x174)
> [   39.638076]  r10:e914be74 r9:40244258 r8:e90ff880 r7:e9232000
> r6:00000000 r5:e9204e40
> [   39.638077]  r4:e91940c0
> [   39.638081] [<4024af08>] (kthread) from [<402010ac>]
> (ret_from_fork+0x14/0x28)
> [   39.638083] Exception stack(0xe9233fb0 to 0xe9233ff8)
> [   39.638085] 3fa0:                                     00000000
> 00000000 00000000 00000000
> [   39.638087] 3fc0: 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000
> [   39.638089] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [   39.638092]  r10:00000000 r9:00000000 r8:00000000 r7:00000000
> r6:00000000 r5:4024af08
> [   39.638093]  r4:e9204e40
> [   39.638095] ---[ end trace 10eeee4f71649fc9 ]---
> [   39.639134] PM: dpm_run_callback(): mdio_bus_phy_resume+0x0/0x64
> returns -5
> [   39.639139] PM: Device unimac-mdio-0:01 failed to resume: error -5
> 
>