lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <VI1PR0402MB2800068E2D6880BE1930DE53E01F0@VI1PR0402MB2800.eurprd04.prod.outlook.com>
Date:   Wed, 29 May 2019 16:11:57 +0000
From:   Ioana Ciornei <ioana.ciornei@....com>
To:     "linux@...linux.org.uk" <linux@...linux.org.uk>,
        "f.fainelli@...il.com" <f.fainelli@...il.com>,
        "andrew@...n.ch" <andrew@...n.ch>
CC:     "hkallweit1@...il.com" <hkallweit1@...il.com>,
        "maxime.chevallier@...tlin.com" <maxime.chevallier@...tlin.com>,
        "olteanv@...il.com" <olteanv@...il.com>,
        "thomas.petazzoni@...tlin.com" <thomas.petazzoni@...tlin.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "vivien.didelot@...il.com" <vivien.didelot@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "lkp@...org" <lkp@...org>,
        kernel test robot <rong.a.chen@...el.com>
Subject: RE: [net] 9dd6d07682: kernel_BUG_at_drivers/net/phy/mdio_bus.c

> Subject: [net] 9dd6d07682: kernel_BUG_at_drivers/net/phy/mdio_bus.c
> 
> FYI, we noticed the following commit (built with gcc-6):
> 
> commit: 9dd6d07682b10a55d1f49d495b85f7b945ff75ca ("[PATCH 10/11] net:
> dsa: Use PHYLINK for the CPU/DSA ports")
> url:
> 
> 
> in testcase: boot
> 
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m
> 2G
> 
> caused below changes (please refer to attached dmesg/kmsg for entire
> log/backtrace):
> 
> 
> +------------------------------------------+------------+------------+
> |                                          | 3a2573f868 | 9dd6d07682 |
> +------------------------------------------+------------+------------+
> | boot_successes                           | 4          | 0          |
> | boot_failures                            | 0          | 6          |
> | kernel_BUG_at_drivers/net/phy/mdio_bus.c | 0          | 6          |
> | invalid_opcode:#[##]                     | 0          | 6          |
> | EIP:mdiobus_free                         | 0          | 6          |
> | Kernel_panic-not_syncing:Fatal_exception | 0          | 6          |
> +------------------------------------------+------------+------------+
> 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <rong.a.chen@...el.com>
> 
> 
> [   39.031781] kernel BUG at drivers/net/phy/mdio_bus.c:503!
> [   39.049792] invalid opcode: 0000 [#1] PREEMPT
> [   39.058345] CPU: 0 PID: 152 Comm: kworker/0:2 Tainted: G                T 5.2.0-
> rc1-00321-g9dd6d07 #1
> [   39.076106] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.10.2-1 04/01/2014
> [   39.091893] Workqueue: events deferred_probe_work_func
> [   39.101903] EIP: mdiobus_free+0x21/0x39
> [   39.109323] Code: 18 5f ed ff 5b 5e 5f 5d c3 55 89 e5 52 89 45 fc 8b 45 fc 8b
> 90 9c 00 00 00 83 fa 01 75 07 e8 9b fa 99 ff eb 1b 83 fa 03 74 02 <0f> 0b c7 80 9c
> 00 00 00 04 00 00 00 05 a0 00 00 00 e8 94 58 ed ff
> [   39.144715] EAX: eff3e008 EBX: eff3a020 ECX: c23a9de9 EDX: 00000002
> [   39.156773] ESI: f0403560 EDI: efffbe24 EBP: efffbdf8 ESP: efffbdf4
> [   39.168825] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010297
> [   39.182048] CR0: 80050033 CR2: 00000000 CR3: 02781000 CR4: 000006b0
> [   39.194136] Call Trace:
> [   39.198932]  _devm_mdiobus_free+0xd/0x10
> [   39.206573]  release_nodes+0x194/0x1ad
> [   39.213804]  devres_release_all+0x37/0x3d
> [   39.221709]  really_probe+0x2b4/0x3a3
> [   39.228808]  driver_probe_device+0x110/0x14b
> [   39.237206]  __device_attach_driver+0x9d/0xa5
> [   39.245622]  bus_for_each_drv+0x65/0x77
> [   39.253081]  __device_attach+0x8f/0x104
> [   39.260507]  ? driver_allows_async_probing+0x26/0x26
> [   39.270161]  device_initial_probe+0x14/0x16
> [   39.278223]  bus_probe_device+0x22/0x64
> [   39.285645]  deferred_probe_work_func+0x7b/0xa1
> [   39.294428]  process_one_work+0x1bc/0x2eb
> [   39.302332]  ? process_one_work+0x164/0x2eb
> [   39.310539]  process_scheduled_works+0x1e/0x24
> [   39.319218]  worker_thread+0x1cb/0x268
> [   39.326604]  kthread+0xeb/0xf0
> [   39.332607]  ? process_scheduled_works+0x24/0x24
> [   39.341641]  ? __kthread_create_on_node+0x128/0x128
> [   39.350985]  ret_from_fork+0x1e/0x28
> [   39.369743] ---[ end trace 2d9c21baf7b99d11 ]---
> 

Hi,

Just to give more context onto what's the path that reaches the BUG_ON, here is the last part of the dmesg before the crash:

[   38.772573] dsa-loop fixed-0:1f: DSA mockup driver: 0x1f
[   38.790366] libphy: dsa slave smi: probed
[   38.799285] dsa-loop fixed-0:1f lan1 (uninitialized): PHY [dsa-0.0:00] driver [Generic PHY]
[   38.815725] dsa-loop fixed-0:1f lan1 (uninitialized): phy: setting supported 00,00000000,000062c8 advertising 00,00000000,000062c8
[   38.856337] dsa-loop fixed-0:1f lan2 (uninitialized): PHY [dsa-0.0:01] driver [Generic PHY]
[   38.872670] dsa-loop fixed-0:1f lan2 (uninitialized): phy: setting supported 00,00000000,000062c8 advertising 00,00000000,000062c8
[   38.903821] dsa-loop fixed-0:1f lan3 (uninitialized): PHY [dsa-0.0:02] driver [Generic PHY]
[   38.920337] dsa-loop fixed-0:1f lan3 (uninitialized): phy: setting supported 00,00000000,000062c8 advertising 00,00000000,000062c8
[   38.952873] dsa-loop fixed-0:1f lan4 (uninitialized): PHY [dsa-0.0:03] driver [Generic PHY]
[   38.969462] dsa-loop fixed-0:1f lan4 (uninitialized): phy: setting supported 00,00000000,000062c8 advertising 00,00000000,000062c8
[   39.002349] could not attach to PHY: -19
[   39.010359] dsa-loop fixed-0:1f: failed to setup link for port 0.5

The dsa-loop mockup driver is trying to register a switch using information based on the dsa_chip_data structure rather than on DTS.
This, of course, leads to PHYLINK being unable to successfully call phylink_of_phy_connect() on a NULL device_node.

Rewinding the stack, we see that dsa_tree_setup() fails and that calling dsa_tree_remove_switch() should take care of removing/unregistering any resources previously allocated.
This does not happen.

Of special interest here is unregistering the MDIO bus which was registered in dsa_switch_setup().
The mdiobus_unregister() is never called because it is conditioned by dts->setup being true, which is set only after _all_ setup steps were performed successfully.
This leads the code to the part where it tries to free an MDIO bus which was not unregistered properly, which is exactly this BUG_ON.

The immediate fix for this is to use PHYLINK on the CPU port only when the device_node associated is not NULL.
As a separate patch from this series, I will also try to fix the initial bug.

--
Ioana

> 
> To reproduce:
> 
>         # build kernel
> 	cd linux
> 	cp config-5.2.0-rc1-00321-g9dd6d07 .config
> 	make HOSTCC=gcc-6 CC=gcc-6 ARCH=i386 olddefconfig
> 	make HOSTCC=gcc-6 CC=gcc-6 ARCH=i386 prepare
> 	make HOSTCC=gcc-6 CC=gcc-6 ARCH=i386 modules_prepare
> 	make HOSTCC=gcc-6 CC=gcc-6 ARCH=i386 SHELL=/bin/bash
> 	make HOSTCC=gcc-6 CC=gcc-6 ARCH=i386 bzImage
> 
> 
>         git clone https://github.com/intel/lkp-tests.git 
>         cd lkp-tests
>         bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email
> 
> 
> 
> Thanks,
> Rong Chen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ