lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6817efe1-f2c2-4686-bdf1-fca11f066e3a@arm.com>
Date: Fri, 25 Jul 2025 19:12:50 +0100
From: Robin Murphy <robin.murphy@....com>
To: Diederik de Haas <didi.debian@...ow.org>, Lee Jones <lee@...nel.org>,
 Pavel Machek <pavel@...nel.org>, Andrew Lunn <andrew+netdev@...n.ch>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>
Cc: linux-leds@...r.kernel.org, netdev@...r.kernel.org,
 linux-arm-kernel@...ts.infradead.org, linux-rockchip@...ts.infradead.org,
 linux-kernel@...r.kernel.org
Subject: Re: BUG: Circular locking dependency on netdev led trigger on NanoPi
 R5S

On 2025-07-25 6:48 pm, Diederik de Haas wrote:
> Hi,
> 
> I have a FriendlyELEC NanoPi R5S (with rk3568 SoC) and in commit
> 1631cbdb8089 ("arm64: dts: rockchip: Improve LED config for NanoPi R5S")
> 
> I tried to improve its LED configuration and that included
> ``linux,default-trigger = "netdev"``
> 
> Problem: sometimes I got a 'hung task' error which resulted in the WAN
> port not to come up (that's the only one I use) and logging in via
> serial also didn't work, so pulling the plug was the only remedy.
> 
> Robin Murphy quickly identified that it likely had to do with led
> triggers and removing those netdev triggers made the problem go away[1].
> To find out what actually caused it, I built a kernel with PROOF_LOCKING
> and PRINTK_CALLER enabled, which after adding a patch which fixed an
> OOPS [2], showed the underlaying problem:

For the record, I think the actual deadlock condition Diederik's system 
hits in practice is a shorter cycle, wherein immediately after acquiring 
pernet_ops_rwsem, thread #0 then tries to take rtnl_mutex, which forms a 
straight inversion against thread #2 (which holds rtnl_mutex from 
devinet_ioctl()).

Thanks,
Robin.

>     ======================================================
>     WARNING: possible circular locking dependency detected
>     6.16-rc7+unreleased-arm64-cknow #1 Not tainted
>     ------------------------------------------------------
>     modprobe/936 is trying to acquire lock:
>     ffffc943e0edc3b0 (pernet_ops_rwsem){++++}-{4:4}, at: register_netdevice_notifier+0x38/0x148
> 
>     but task is already holding lock:
>     ffff0001f2762248 (&led_cdev->trigger_lock){+.+.}-{4:4}, at: led_trigger_register+0x14c/0x1e0
> 
>     which lock already depends on the new lock.
> 
> 
>     the existing dependency chain (in reverse order) is:
> 
>     -> #3 (&led_cdev->trigger_lock){+.+.}-{4:4}:
>            lock_acquire+0x1cc/0x348
>            down_write+0x40/0xd8
>            led_trigger_set_default+0x5c/0x170
>            led_classdev_register_ext+0x340/0x488
>            __sdhci_add_host+0x190/0x368 [sdhci]
>            dwcmshc_probe+0x2b8/0x6b0 [sdhci_of_dwcmshc]
>            platform_probe+0x70/0xe8
>            really_probe+0xc8/0x3a0
>            __driver_probe_device+0x84/0x160
>            driver_probe_device+0x44/0x128
>            __device_attach_driver+0xc4/0x170
>            bus_for_each_drv+0x90/0xf8
>            __device_attach_async_helper+0xc0/0x120
>            async_run_entry_fn+0x40/0x180
>            process_one_work+0x23c/0x640
>            worker_thread+0x1b4/0x360
>            kthread+0x150/0x250
>            ret_from_fork+0x10/0x20
> 
>     -> #2 (triggers_list_lock){++++}-{4:4}:
>            lock_acquire+0x1cc/0x348
>            down_write+0x40/0xd8
>            led_trigger_register+0x58/0x1e0
>            phy_led_triggers_register+0xf4/0x258 [libphy]
>            phy_attach_direct+0x328/0x3a8 [libphy]
>            phylink_fwnode_phy_connect+0xb0/0x138 [phylink]
>            __stmmac_open+0xec/0x520 [stmmac]
>            stmmac_open+0x4c/0xe8 [stmmac]
>            __dev_open+0x13c/0x310
>            __dev_change_flags+0x1d4/0x260
>            netif_change_flags+0x2c/0x80
>            dev_change_flags+0x90/0xd0
>            devinet_ioctl+0x55c/0x730
>            inet_ioctl+0x1e4/0x200
>            sock_do_ioctl+0x6c/0x140
>            sock_ioctl+0x328/0x3c0
>            __arm64_sys_ioctl+0xb4/0x118
>            invoke_syscall+0x6c/0x100
>            el0_svc_common.constprop.0+0x48/0xf0
>            do_el0_svc+0x24/0x38
>            el0_svc+0x54/0x1e0
>            el0t_64_sync_handler+0x10c/0x140
>            el0t_64_sync+0x198/0x1a0
> 
>     -> #1 (rtnl_mutex){+.+.}-{4:4}:
>            lock_acquire+0x1cc/0x348
>            __mutex_lock+0xac/0x590
>            mutex_lock_nested+0x2c/0x40
>            rtnl_lock+0x24/0x38
>            register_netdevice_notifier+0x40/0x148
>            rtnetlink_init+0x40/0x68
>            netlink_proto_init+0x120/0x158
>            do_one_initcall+0x88/0x3b8
>            kernel_init_freeable+0x2d0/0x340
>            kernel_init+0x28/0x160
>            ret_from_fork+0x10/0x20
> 
>     -> #0 (pernet_ops_rwsem){++++}-{4:4}:
>            check_prev_add+0x114/0xcb8
>            __lock_acquire+0x12e8/0x15f0
>            lock_acquire+0x1cc/0x348
>            down_write+0x40/0xd8
>            register_netdevice_notifier+0x38/0x148
>            netdev_trig_activate+0x18c/0x1e8 [ledtrig_netdev]
>            led_trigger_set+0x1d4/0x328
>            led_trigger_register+0x194/0x1e0
>            netdev_led_trigger_init+0x20/0xff8 [ledtrig_netdev]
>            do_one_initcall+0x88/0x3b8
>            do_init_module+0x5c/0x270
>            load_module+0x1ed8/0x2608
>            init_module_from_file+0x94/0x100
>            idempotent_init_module+0x1e8/0x2f0
>            __arm64_sys_finit_module+0x70/0xe8
>            invoke_syscall+0x6c/0x100
>            el0_svc_common.constprop.0+0x48/0xf0
>            do_el0_svc+0x24/0x38
>            el0_svc+0x54/0x1e0
>            el0t_64_sync_handler+0x10c/0x140
>            el0t_64_sync+0x198/0x1a0
> 
>     other info that might help us debug this:
> 
>     Chain exists of:
>       pernet_ops_rwsem --> triggers_list_lock --> &led_cdev->trigger_lock
> 
>      Possible unsafe locking scenario:
> 
>            CPU0                    CPU1
>            ----                    ----
>       lock(&led_cdev->trigger_lock);
>                                    lock(triggers_list_lock);
>                                    lock(&led_cdev->trigger_lock);
>       lock(pernet_ops_rwsem);
> 
>      *** DEADLOCK ***
> 
>     2 locks held by modprobe/936:
>      #0: ffffc943e0d2baa8 (leds_list_lock){++++}-{4:4}, at: led_trigger_register+0x10c/0x1e0
>      #1: ffff0001f2762248 (&led_cdev->trigger_lock){+.+.}-{4:4}, at: led_trigger_register+0x14c/0x1e0
> 
>     stack backtrace:
>     CPU: 0 UID: 0 PID: 936 Comm: modprobe Not tainted 6.16-rc7+unreleased-arm64-cknow #1 PREEMPTLAZY  Debian 6.16~rc7-2~exp1
>     Hardware name: FriendlyElec NanoPi R5S (DT)
>     Call trace:
>      show_stack+0x34/0xa0 (C)
>      dump_stack_lvl+0x70/0x98
>      dump_stack+0x18/0x24
>      print_circular_bug+0x230/0x280
>      check_noncircular+0x174/0x188
>      check_prev_add+0x114/0xcb8
>      __lock_acquire+0x12e8/0x15f0
>      lock_acquire+0x1cc/0x348
>      down_write+0x40/0xd8
>      register_netdevice_notifier+0x38/0x148
>      netdev_trig_activate+0x18c/0x1e8 [ledtrig_netdev]
>      led_trigger_set+0x1d4/0x328
>      led_trigger_register+0x194/0x1e0
>      netdev_led_trigger_init+0x20/0xff8 [ledtrig_netdev]
>      do_one_initcall+0x88/0x3b8
>      do_init_module+0x5c/0x270
>      load_module+0x1ed8/0x2608
>      init_module_from_file+0x94/0x100
>      idempotent_init_module+0x1e8/0x2f0
>      __arm64_sys_finit_module+0x70/0xe8
>      invoke_syscall+0x6c/0x100
>      el0_svc_common.constprop.0+0x48/0xf0
>      do_el0_svc+0x24/0x38
>      el0_svc+0x54/0x1e0
>      el0t_64_sync_handler+0x10c/0x140
>      el0t_64_sync+0x198/0x1a0
>     leds-gpio gpio-leds: bus: 'platform': really_probe: bound device to driver leds-gpio
> 
> Full serial log can be found at [3] which is quite verbose and the boot
> took way longer then normal as the following was added to cmdline:
> ``dyndbg="file dd.c func really_probe +p" maxcpus=1``
> 
> Free free to ask for additional info and/or to run tests.
> 
> Cheers,
>    Diederik
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git/commit/?h=arm/fixes&id=912b1f2a796ec73530a709b11821cb0c249fb23e
> [2] https://lore.kernel.org/linux-rockchip/f81b88df-9959-4968-a60a-b7efd3d5ea24@arm.com/
> [3] https://paste.sr.ht/~diederik/142e92bfb29bbb58bca18a74cdffc5e0ba79081c


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ