lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <475ee9ae8cdca5ce86b708fe0ade7c9d@manjaro.org>
Date: Wed, 30 Jul 2025 21:50:25 +0200
From: Dragan Simic <dsimic@...jaro.org>
To: Robin Murphy <robin.murphy@....com>
Cc: Diederik de Haas <didi.debian@...ow.org>, Lee Jones <lee@...nel.org>,
 Pavel Machek <pavel@...nel.org>, Andrew Lunn <andrew+netdev@...n.ch>, "David
 S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub
 Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 linux-leds@...r.kernel.org, netdev@...r.kernel.org,
 linux-arm-kernel@...ts.infradead.org, linux-rockchip@...ts.infradead.org,
 linux-kernel@...r.kernel.org
Subject: Re: BUG: Circular locking dependency on netdev led trigger on NanoPi
 R5S

Hello Robin and Diederik,

On 2025-07-25 20:12, Robin Murphy wrote:
> On 2025-07-25 6:48 pm, Diederik de Haas wrote:
>> I have a FriendlyELEC NanoPi R5S (with rk3568 SoC) and in commit
>> 1631cbdb8089 ("arm64: dts: rockchip: Improve LED config for NanoPi 
>> R5S")
>> 
>> I tried to improve its LED configuration and that included
>> ``linux,default-trigger = "netdev"``
>> 
>> Problem: sometimes I got a 'hung task' error which resulted in the WAN
>> port not to come up (that's the only one I use) and logging in via
>> serial also didn't work, so pulling the plug was the only remedy.
>> 
>> Robin Murphy quickly identified that it likely had to do with led
>> triggers and removing those netdev triggers made the problem go 
>> away[1].
>> To find out what actually caused it, I built a kernel with 
>> PROOF_LOCKING
>> and PRINTK_CALLER enabled, which after adding a patch which fixed an
>> OOPS [2], showed the underlaying problem:
> 
> For the record, I think the actual deadlock condition Diederik's
> system hits in practice is a shorter cycle, wherein immediately after
> acquiring pernet_ops_rwsem, thread #0 then tries to take rtnl_mutex,
> which forms a straight inversion against thread #2 (which holds
> rtnl_mutex from devinet_ioctl()).

Thanks for the bug report and for the additional insights!

I've spent some time digging through the LED subsystem, which I'm
already somewhat familiar with, and I think I've narrowed down the
root cause of this deadlock.

I'll send a preliminary patch soon, after I make sure that the root
cause is identified correctly, and I hope Diederik will be willing
to test the patch.  If so, and if the patch checks out to be the
cure, I'll prepare and submit a proper patch, of course.

>>     ======================================================
>>     WARNING: possible circular locking dependency detected
>>     6.16-rc7+unreleased-arm64-cknow #1 Not tainted
>>     ------------------------------------------------------
>>     modprobe/936 is trying to acquire lock:
>>     ffffc943e0edc3b0 (pernet_ops_rwsem){++++}-{4:4}, at: 
>> register_netdevice_notifier+0x38/0x148
>> 
>>     but task is already holding lock:
>>     ffff0001f2762248 (&led_cdev->trigger_lock){+.+.}-{4:4}, at: 
>> led_trigger_register+0x14c/0x1e0
>> 
>>     which lock already depends on the new lock.
>> 
>>     the existing dependency chain (in reverse order) is:
>> 
>>     -> #3 (&led_cdev->trigger_lock){+.+.}-{4:4}:
>>            lock_acquire+0x1cc/0x348
>>            down_write+0x40/0xd8
>>            led_trigger_set_default+0x5c/0x170
>>            led_classdev_register_ext+0x340/0x488
>>            __sdhci_add_host+0x190/0x368 [sdhci]
>>            dwcmshc_probe+0x2b8/0x6b0 [sdhci_of_dwcmshc]
>>            platform_probe+0x70/0xe8
>>            really_probe+0xc8/0x3a0
>>            __driver_probe_device+0x84/0x160
>>            driver_probe_device+0x44/0x128
>>            __device_attach_driver+0xc4/0x170
>>            bus_for_each_drv+0x90/0xf8
>>            __device_attach_async_helper+0xc0/0x120
>>            async_run_entry_fn+0x40/0x180
>>            process_one_work+0x23c/0x640
>>            worker_thread+0x1b4/0x360
>>            kthread+0x150/0x250
>>            ret_from_fork+0x10/0x20
>> 
>>     -> #2 (triggers_list_lock){++++}-{4:4}:
>>            lock_acquire+0x1cc/0x348
>>            down_write+0x40/0xd8
>>            led_trigger_register+0x58/0x1e0
>>            phy_led_triggers_register+0xf4/0x258 [libphy]
>>            phy_attach_direct+0x328/0x3a8 [libphy]
>>            phylink_fwnode_phy_connect+0xb0/0x138 [phylink]
>>            __stmmac_open+0xec/0x520 [stmmac]
>>            stmmac_open+0x4c/0xe8 [stmmac]
>>            __dev_open+0x13c/0x310
>>            __dev_change_flags+0x1d4/0x260
>>            netif_change_flags+0x2c/0x80
>>            dev_change_flags+0x90/0xd0
>>            devinet_ioctl+0x55c/0x730
>>            inet_ioctl+0x1e4/0x200
>>            sock_do_ioctl+0x6c/0x140
>>            sock_ioctl+0x328/0x3c0
>>            __arm64_sys_ioctl+0xb4/0x118
>>            invoke_syscall+0x6c/0x100
>>            el0_svc_common.constprop.0+0x48/0xf0
>>            do_el0_svc+0x24/0x38
>>            el0_svc+0x54/0x1e0
>>            el0t_64_sync_handler+0x10c/0x140
>>            el0t_64_sync+0x198/0x1a0
>> 
>>     -> #1 (rtnl_mutex){+.+.}-{4:4}:
>>            lock_acquire+0x1cc/0x348
>>            __mutex_lock+0xac/0x590
>>            mutex_lock_nested+0x2c/0x40
>>            rtnl_lock+0x24/0x38
>>            register_netdevice_notifier+0x40/0x148
>>            rtnetlink_init+0x40/0x68
>>            netlink_proto_init+0x120/0x158
>>            do_one_initcall+0x88/0x3b8
>>            kernel_init_freeable+0x2d0/0x340
>>            kernel_init+0x28/0x160
>>            ret_from_fork+0x10/0x20
>> 
>>     -> #0 (pernet_ops_rwsem){++++}-{4:4}:
>>            check_prev_add+0x114/0xcb8
>>            __lock_acquire+0x12e8/0x15f0
>>            lock_acquire+0x1cc/0x348
>>            down_write+0x40/0xd8
>>            register_netdevice_notifier+0x38/0x148
>>            netdev_trig_activate+0x18c/0x1e8 [ledtrig_netdev]
>>            led_trigger_set+0x1d4/0x328
>>            led_trigger_register+0x194/0x1e0
>>            netdev_led_trigger_init+0x20/0xff8 [ledtrig_netdev]
>>            do_one_initcall+0x88/0x3b8
>>            do_init_module+0x5c/0x270
>>            load_module+0x1ed8/0x2608
>>            init_module_from_file+0x94/0x100
>>            idempotent_init_module+0x1e8/0x2f0
>>            __arm64_sys_finit_module+0x70/0xe8
>>            invoke_syscall+0x6c/0x100
>>            el0_svc_common.constprop.0+0x48/0xf0
>>            do_el0_svc+0x24/0x38
>>            el0_svc+0x54/0x1e0
>>            el0t_64_sync_handler+0x10c/0x140
>>            el0t_64_sync+0x198/0x1a0
>> 
>>     other info that might help us debug this:
>> 
>>     Chain exists of:
>>       pernet_ops_rwsem --> triggers_list_lock --> 
>> &led_cdev->trigger_lock
>> 
>>      Possible unsafe locking scenario:
>> 
>>            CPU0                    CPU1
>>            ----                    ----
>>       lock(&led_cdev->trigger_lock);
>>                                    lock(triggers_list_lock);
>>                                    lock(&led_cdev->trigger_lock);
>>       lock(pernet_ops_rwsem);
>> 
>>      *** DEADLOCK ***
>> 
>>     2 locks held by modprobe/936:
>>      #0: ffffc943e0d2baa8 (leds_list_lock){++++}-{4:4}, at: 
>> led_trigger_register+0x10c/0x1e0
>>      #1: ffff0001f2762248 (&led_cdev->trigger_lock){+.+.}-{4:4}, at: 
>> led_trigger_register+0x14c/0x1e0
>> 
>>     stack backtrace:
>>     CPU: 0 UID: 0 PID: 936 Comm: modprobe Not tainted 
>> 6.16-rc7+unreleased-arm64-cknow #1 PREEMPTLAZY  Debian 6.16~rc7-2~exp1
>>     Hardware name: FriendlyElec NanoPi R5S (DT)
>>     Call trace:
>>      show_stack+0x34/0xa0 (C)
>>      dump_stack_lvl+0x70/0x98
>>      dump_stack+0x18/0x24
>>      print_circular_bug+0x230/0x280
>>      check_noncircular+0x174/0x188
>>      check_prev_add+0x114/0xcb8
>>      __lock_acquire+0x12e8/0x15f0
>>      lock_acquire+0x1cc/0x348
>>      down_write+0x40/0xd8
>>      register_netdevice_notifier+0x38/0x148
>>      netdev_trig_activate+0x18c/0x1e8 [ledtrig_netdev]
>>      led_trigger_set+0x1d4/0x328
>>      led_trigger_register+0x194/0x1e0
>>      netdev_led_trigger_init+0x20/0xff8 [ledtrig_netdev]
>>      do_one_initcall+0x88/0x3b8
>>      do_init_module+0x5c/0x270
>>      load_module+0x1ed8/0x2608
>>      init_module_from_file+0x94/0x100
>>      idempotent_init_module+0x1e8/0x2f0
>>      __arm64_sys_finit_module+0x70/0xe8
>>      invoke_syscall+0x6c/0x100
>>      el0_svc_common.constprop.0+0x48/0xf0
>>      do_el0_svc+0x24/0x38
>>      el0_svc+0x54/0x1e0
>>      el0t_64_sync_handler+0x10c/0x140
>>      el0t_64_sync+0x198/0x1a0
>>     leds-gpio gpio-leds: bus: 'platform': really_probe: bound device 
>> to driver leds-gpio
>> 
>> Full serial log can be found at [3] which is quite verbose and the 
>> boot
>> took way longer then normal as the following was added to cmdline:
>> ``dyndbg="file dd.c func really_probe +p" maxcpus=1``
>> 
>> Free free to ask for additional info and/or to run tests.
>> 
>> [1] 
>> https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git/commit/?h=arm/fixes&id=912b1f2a796ec73530a709b11821cb0c249fb23e
>> [2] 
>> https://lore.kernel.org/linux-rockchip/f81b88df-9959-4968-a60a-b7efd3d5ea24@arm.com/
>> [3] 
>> https://paste.sr.ht/~diederik/142e92bfb29bbb58bca18a74cdffc5e0ba79081c

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ