lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 29 May 2019 20:08:32 +0000
From:   Ioana Ciornei <ioana.ciornei@....com>
To:     Russell King - ARM Linux admin <linux@...linux.org.uk>
CC:     "f.fainelli@...il.com" <f.fainelli@...il.com>,
        "andrew@...n.ch" <andrew@...n.ch>,
        "hkallweit1@...il.com" <hkallweit1@...il.com>,
        "maxime.chevallier@...tlin.com" <maxime.chevallier@...tlin.com>,
        "olteanv@...il.com" <olteanv@...il.com>,
        "thomas.petazzoni@...tlin.com" <thomas.petazzoni@...tlin.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "vivien.didelot@...il.com" <vivien.didelot@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "lkp@...org" <lkp@...org>,
        kernel test robot <rong.a.chen@...el.com>
Subject: Re: [net] 9dd6d07682: kernel_BUG_at_drivers/net/phy/mdio_bus.c

On 5/29/19 7:25 PM, Russell King - ARM Linux admin wrote:
> On Wed, May 29, 2019 at 04:11:57PM +0000, Ioana Ciornei wrote:
>>> Subject: [net] 9dd6d07682: kernel_BUG_at_drivers/net/phy/mdio_bus.c
>>>
>>> FYI, we noticed the following commit (built with gcc-6):
>>>
>>> commit: 9dd6d07682b10a55d1f49d495b85f7b945ff75ca ("[PATCH 10/11] net:
>>> dsa: Use PHYLINK for the CPU/DSA ports")
>>> url:
>>>
>>>
>>> in testcase: boot
>>>
>>> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m
>>> 2G
>>>
>>> caused below changes (please refer to attached dmesg/kmsg for entire
>>> log/backtrace):
>>>
>>>
>>> +------------------------------------------+------------+------------+
>>> |                                          | 3a2573f868 | 9dd6d07682 |
>>> +------------------------------------------+------------+------------+
>>> | boot_successes                           | 4          | 0          |
>>> | boot_failures                            | 0          | 6          |
>>> | kernel_BUG_at_drivers/net/phy/mdio_bus.c | 0          | 6          |
>>> | invalid_opcode:#[##]                     | 0          | 6          |
>>> | EIP:mdiobus_free                         | 0          | 6          |
>>> | Kernel_panic-not_syncing:Fatal_exception | 0          | 6          |
>>> +------------------------------------------+------------+------------+
>>>
>>>
>>> If you fix the issue, kindly add following tag
>>> Reported-by: kernel test robot <rong.a.chen@...el.com>
>>>
>>>
>>> [   39.031781] kernel BUG at drivers/net/phy/mdio_bus.c:503!
>>> [   39.049792] invalid opcode: 0000 [#1] PREEMPT
>>> [   39.058345] CPU: 0 PID: 152 Comm: kworker/0:2 Tainted: G                T 5.2.0-
>>> rc1-00321-g9dd6d07 #1
>>> [   39.076106] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>>> 1.10.2-1 04/01/2014
>>> [   39.091893] Workqueue: events deferred_probe_work_func
>>> [   39.101903] EIP: mdiobus_free+0x21/0x39
>>> [   39.109323] Code: 18 5f ed ff 5b 5e 5f 5d c3 55 89 e5 52 89 45 fc 8b 45 fc 8b
>>> 90 9c 00 00 00 83 fa 01 75 07 e8 9b fa 99 ff eb 1b 83 fa 03 74 02 <0f> 0b c7 80 9c
>>> 00 00 00 04 00 00 00 05 a0 00 00 00 e8 94 58 ed ff
>>> [   39.144715] EAX: eff3e008 EBX: eff3a020 ECX: c23a9de9 EDX: 00000002
>>> [   39.156773] ESI: f0403560 EDI: efffbe24 EBP: efffbdf8 ESP: efffbdf4
>>> [   39.168825] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010297
>>> [   39.182048] CR0: 80050033 CR2: 00000000 CR3: 02781000 CR4: 000006b0
>>> [   39.194136] Call Trace:
>>> [   39.198932]  _devm_mdiobus_free+0xd/0x10
>>> [   39.206573]  release_nodes+0x194/0x1ad
>>> [   39.213804]  devres_release_all+0x37/0x3d
>>> [   39.221709]  really_probe+0x2b4/0x3a3
>>> [   39.228808]  driver_probe_device+0x110/0x14b
>>> [   39.237206]  __device_attach_driver+0x9d/0xa5
>>> [   39.245622]  bus_for_each_drv+0x65/0x77
>>> [   39.253081]  __device_attach+0x8f/0x104
>>> [   39.260507]  ? driver_allows_async_probing+0x26/0x26
>>> [   39.270161]  device_initial_probe+0x14/0x16
>>> [   39.278223]  bus_probe_device+0x22/0x64
>>> [   39.285645]  deferred_probe_work_func+0x7b/0xa1
>>> [   39.294428]  process_one_work+0x1bc/0x2eb
>>> [   39.302332]  ? process_one_work+0x164/0x2eb
>>> [   39.310539]  process_scheduled_works+0x1e/0x24
>>> [   39.319218]  worker_thread+0x1cb/0x268
>>> [   39.326604]  kthread+0xeb/0xf0
>>> [   39.332607]  ? process_scheduled_works+0x24/0x24
>>> [   39.341641]  ? __kthread_create_on_node+0x128/0x128
>>> [   39.350985]  ret_from_fork+0x1e/0x28
>>> [   39.369743] ---[ end trace 2d9c21baf7b99d11 ]---
>>>
>>
>> Hi,
>>
>> Just to give more context onto what's the path that reaches the BUG_ON, here is the last part of the dmesg before the crash:
>>
>> [   38.772573] dsa-loop fixed-0:1f: DSA mockup driver: 0x1f
>> [   38.790366] libphy: dsa slave smi: probed
>> [   38.799285] dsa-loop fixed-0:1f lan1 (uninitialized): PHY [dsa-0.0:00] driver [Generic PHY]
>> [   38.815725] dsa-loop fixed-0:1f lan1 (uninitialized): phy: setting supported 00,00000000,000062c8 advertising 00,00000000,000062c8
>> [   38.856337] dsa-loop fixed-0:1f lan2 (uninitialized): PHY [dsa-0.0:01] driver [Generic PHY]
>> [   38.872670] dsa-loop fixed-0:1f lan2 (uninitialized): phy: setting supported 00,00000000,000062c8 advertising 00,00000000,000062c8
>> [   38.903821] dsa-loop fixed-0:1f lan3 (uninitialized): PHY [dsa-0.0:02] driver [Generic PHY]
>> [   38.920337] dsa-loop fixed-0:1f lan3 (uninitialized): phy: setting supported 00,00000000,000062c8 advertising 00,00000000,000062c8
>> [   38.952873] dsa-loop fixed-0:1f lan4 (uninitialized): PHY [dsa-0.0:03] driver [Generic PHY]
>> [   38.969462] dsa-loop fixed-0:1f lan4 (uninitialized): phy: setting supported 00,00000000,000062c8 advertising 00,00000000,000062c8
>> [   39.002349] could not attach to PHY: -19
>> [   39.010359] dsa-loop fixed-0:1f: failed to setup link for port 0.5
>>
>> The dsa-loop mockup driver is trying to register a switch using information based on the dsa_chip_data structure rather than on DTS.
>> This, of course, leads to PHYLINK being unable to successfully call phylink_of_phy_connect() on a NULL device_node.
> 
> Right, phylink_of_phy_connect() is not supposed to be called with a NULL
> device node, but if it is, it fails gracefull
 >
>> Rewinding the stack, we see that dsa_tree_setup() fails and that calling dsa_tree_remove_switch() should take care of removing/unregistering any resources previously allocated.
>> This does not happen.
>>
>> Of special interest here is unregistering the MDIO bus which was registered in dsa_switch_setup().
>> The mdiobus_unregister() is never called because it is conditioned by dts->setup being true, which is set only after _all_ setup steps were performed successfully.
>> This leads the code to the part where it tries to free an MDIO bus which was not unregistered properly, which is exactly this BUG_ON.
>>
>> The immediate fix for this is to use PHYLINK on the CPU port only when the device_node associated is not NULL.
>> As a separate patch from this series, I will also try to fix the initial bug.
> 
> Yep, it sounds like a short-coming of the error handling, which should
> be the first priority to fix; avoiding PHYLINK on the CPU port when
> the device_node is NULL seems to me like papering over a bug.
I will add a patch that fixes the error path. I agree that this is 
important.

However, this will not fix dsa-loop (or any other driver that doesn't 
probe on OF bindings) since the phylink_of_phy_connect() will still fail 
on the CPU port. From what I see in dsa_port_setup_phy_of(), the error 
was ignored previously:

    phydev = dsa_port_get_phy_device(dp);
    if (!phydev)
          return 0;

I don't see how this method is sane either.

--
Ioana

> 
> However, should mention that I have been carrying a patch for some time
> for ZII boards to disable the error path for "no PHY" in
> dsa_slave_phy_setup() - iow, when phylink_of_phy_connect() returns
> -ENODEV and dsa_slave_phy_connect() also errors out.  That was the only
> way I could get the SFF modules to work there.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ