lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251003182407.70d495ba@booty>
Date: Fri, 3 Oct 2025 18:24:07 +0200
From: Luca Ceresoli <luca.ceresoli@...tlin.com>
To: Stephen Boyd <sboyd@...nel.org>
Cc: Danilo Krummrich <dakr@...nel.org>, Greg Kroah-Hartman
 <gregkh@...uxfoundation.org>, Len Brown <len.brown@...el.com>, Michael
 Turquette <mturquette@...libre.com>, Miquel Raynal
 <miquel.raynal@...tlin.com>, Pavel Machek <pavel@....cz>, "Rafael J.
 Wysocki" <rafael@...nel.org>, Thomas Petazzoni
 <thomas.petazzoni@...tlin.com>, linux-pm@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-clk@...r.kernel.org, Chen-Yu Tsai
 <wenst@...omium.org>, Lucas Stach <l.stach@...gutronix.de>, Laurent
 Pinchart <laurent.pinchart@...asonboard.com>, Marek Vasut <marex@...x.de>,
 Ulf Hansson <ulf.hansson@...aro.org>, Kevin Hilman <khilman@...nel.org>,
 Fabio Estevam <festevam@...x.de>, Jacky Bai <ping.bai@....com>, Peng Fan
 <peng.fan@....com>, Shawn Guo <shawnguo@...nel.org>, Shengjiu Wang
 <shengjiu.wang@....com>, linux-imx@....com, Ian Ray
 <ian.ray@...ealthcare.com>, Hervé Codina
 <herve.codina@...tlin.com>, Saravana Kannan <saravanak@...gle.com>
Subject: Re: [PATCH RFC 00/10] Fix the ABBA locking situation between clk
 and runtime PM

Hello Stephen, all,

On Mon, 14 Apr 2025 18:00:15 -0700
Stephen Boyd <sboyd@...nel.org> wrote:

> Quoting Miquel Raynal (2025-03-26 11:26:15)
> > As explained in the following thread, there is a known ABBA locking
> > dependency between clk and runtime PM.
> > Link: https://lore.kernel.org/linux-clk/20240527181928.4fc6b5f0@xps-13/
> > 
> > The problem is that the clk subsystem uses a mutex to protect concurrent
> > accesses to its tree structure, and so do other subsystems such as
> > generic power domains. While it holds its own mutex, the clk subsystem
> > performs runtime PM calls which end up executing callbacks from other
> > subsystems (again, gen PD is in the loop). But typically power domains
> > may also need to perform clock related operations, and thus the
> > following two situations may happen:
> > 
> > mutex_lock(clk);
> > mutex_lock(genpd);
> > 
> > or
> > 
> > mutex_lock(genpd);
> > mutex_lock(clk);
> > 
> > As of today I know that at least NXP i.MX8MP and MediaTek MT8183 SoCs
> > are complex enough to face this kind of issues.
> > 
> > There's been a first workaround to "silence" lockdep with the most
> > obvious case triggering the warning: making sure all clocks are RPM
> > enabled before running the clk_disable_unused() work, but this is just
> > addressing one situation among many other potentially problematic
> > situations. In the past, both Laurent Pinchart and Marek Vasut have
> > experienced these issues when enabling HDMI and audio support,
> > respectively.
> > 
> > Following a discussion we had at last Plumbers with Steven, I am
> > proposing to decouple both locks by changing a bit the clk approach:
> > let's always runtime resume all clocks that we *might* need before
> > taking the clock lock. But how do we know the list? Well, depending on
> > the situation we may either need to wake up:
> > - the upper part of the tree during prepare/unprepare operations.
> > - the lower part of the tree during (read) rate operations.
> > - the upper part and the lower part of the tree otherwise (especially
> >   during rate changes which may involve reparenting).  
> 
> Thanks for taking on this work. This problem is coming up more and more
> often.

Reviving this thread after today I had a very rare occurrence of
apparently this same issue:

  WARNING: possible circular locking dependency detected

It happened on imx8mp, on a board and with a setup that I'm using since
many months to do unrelated development (mostly DRM). It was a very rare
occurrence because I always have clk_ignore_unused in my kernel cmdline.

On my setup that warning appeared exactly once in thousands of boots
I've done in several months. Just rebooting without changing anything
and it didn't show up again.

Here's the full warning message:

[    5.077473] ======================================================
[    5.083658] WARNING: possible circular locking dependency detected
[    5.089845] 6.17.0-rc4+ #2 Tainted: G                T  
[    5.095164] ------------------------------------------------------
[    5.101346] kworker/u16:4/52 is trying to acquire lock:
[    5.106576] ffff0000016ae740 (&genpd->mlock){+.+.}-{4:4}, at: genpd_lock_mtx+0x20/0x38
[    5.114533] 
[    5.114533] but task is already holding lock:
[    5.120368] ffff800084eb5258 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x38/0xc0
[    5.128404] 
[    5.128404] which lock already depends on the new lock.
[    5.128404] 
[    5.136583] 
[    5.136583] the existing dependency chain (in reverse order) is:
[    5.144070] 
[    5.144070] -> #1 (prepare_lock){+.+.}-{4:4}:
[    5.149924]        __mutex_lock+0xb8/0x7f0
[    5.154034]        mutex_lock_nested+0x2c/0x40
[    5.158487]        clk_prepare_lock+0x58/0xc0
[    5.162849]        clk_prepare+0x28/0x58
[    5.166780]        clk_bulk_prepare+0x54/0xe8
[    5.171141]        imx_pgc_power_up+0x80/0x378
[    5.175592]        _genpd_power_on+0xa0/0x168
[    5.179955]        genpd_power_on+0xd8/0x248
[    5.184234]        genpd_runtime_resume+0x12c/0x298
[    5.189121]        __rpm_callback+0x50/0x200
[    5.193400]        rpm_callback+0x7c/0x90
[    5.197414]        rpm_resume+0x534/0x718
[    5.201432]        __pm_runtime_resume+0x58/0xa8
[    5.206056]        pm_runtime_get_suppliers+0x6c/0xa0
[    5.211117]        __driver_probe_device+0x50/0x140
[    5.216002]        driver_probe_device+0xe0/0x170
[    5.220710]        __driver_attach+0xa0/0x1c0
[    5.225074]        bus_for_each_dev+0x90/0xf8
[    5.229436]        driver_attach+0x2c/0x40
[    5.233538]        bus_add_driver+0xec/0x218
[    5.237816]        driver_register+0x64/0x138
[    5.242178]        __platform_driver_register+0x2c/0x40
[    5.247411]        hotplug_bridge_dynconn_get_modes+0x28/0x48 [hotplug_bridge]
[    5.254654]        do_one_initcall+0x84/0x358
[    5.259020]        do_init_module+0x60/0x268
[    5.263298]        load_module+0x1fc4/0x2108
[    5.267574]        init_module_from_file+0x90/0xe0
[    5.272372]        idempotent_init_module+0x1f8/0x300
[    5.277432]        __arm64_sys_finit_module+0x6c/0xb8
[    5.282491]        invoke_syscall+0x50/0x120
[    5.286771]        el0_svc_common.constprop.0+0x48/0xf0
[    5.292004]        do_el0_svc+0x24/0x38
[    5.295848]        el0_svc+0x4c/0x160
[    5.299519]        el0t_64_sync_handler+0xa0/0xe8
[    5.304229]        el0t_64_sync+0x198/0x1a0
[    5.308420] 
[    5.308420] -> #0 (&genpd->mlock){+.+.}-{4:4}:
[    5.314359]        __lock_acquire+0x1338/0x1f50
[    5.318897]        lock_acquire+0x1c4/0x350
[    5.323089]        __mutex_lock+0xb8/0x7f0
[    5.327194]        mutex_lock_nested+0x2c/0x40
[    5.331645]        genpd_lock_mtx+0x20/0x38
[    5.335834]        genpd_runtime_resume+0x118/0x298
[    5.340721]        __rpm_callback+0x50/0x200
[    5.344997]        rpm_callback+0x7c/0x90
[    5.349013]        rpm_resume+0x534/0x718
[    5.353029]        __pm_runtime_resume+0x58/0xa8
[    5.357653]        clk_pm_runtime_get.part.0.isra.0+0x24/0x98
[    5.363408]        __clk_register+0x51c/0x970
[    5.367771]        devm_clk_hw_register+0x64/0xe8
[    5.372481]        imx8mp_hsio_blk_ctrl_probe+0xa0/0xf8
[    5.377712]        imx8mp_blk_ctrl_probe+0x358/0x568
[    5.382684]        platform_probe+0x64/0xa8
[    5.386875]        really_probe+0xc4/0x2b8
[    5.390976]        __driver_probe_device+0x80/0x140
[    5.395860]        driver_probe_device+0xe0/0x170
[    5.400571]        __device_attach_driver+0xc0/0x148
[    5.405542]        bus_for_each_drv+0x9c/0x108
[    5.409991]        __device_attach+0xa8/0x1a0
[    5.414354]        device_initial_probe+0x1c/0x30
[    5.419065]        bus_probe_device+0xb4/0xc0
[    5.423428]        deferred_probe_work_func+0x90/0xd8
[    5.428485]        process_one_work+0x214/0x618
[    5.433027]        worker_thread+0x1b4/0x368
[    5.437305]        kthread+0x150/0x238
[    5.441062]        ret_from_fork+0x10/0x20
[    5.445165] 
[    5.445165] other info that might help us debug this:
[    5.445165] 
[    5.453171]  Possible unsafe locking scenario:
[    5.453171] 
[    5.459091]        CPU0                    CPU1
[    5.463622]        ----                    ----
[    5.468151]   lock(prepare_lock);
[    5.471476]                                lock(&genpd->mlock);
[    5.477405]                                lock(prepare_lock);
[    5.483248]   lock(&genpd->mlock);
[    5.486655] 
[    5.486655]  *** DEADLOCK ***
[    5.486655] 
[    5.492577] 4 locks held by kworker/u16:4/52:
[    5.496937]  #0: ffff000000030948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x198/0x618
[    5.507051]  #1: ffff800086e3bd70 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1c0/0x618
[    5.516299]  #2: ffff000000c608f8 (&dev->mutex){....}-{4:4}, at: __device_attach+0x44/0x1a0
[    5.524680]  #3: ffff800084eb5258 (prepare_lock){+.+.}-{4:4}, at: clk_prepare_lock+0x38/0xc0
[    5.533150] 
[    5.533150] stack backtrace:
[    5.537512] CPU: 2 UID: 0 PID: 52 Comm: kworker/u16:4 Tainted: G                T   6.17.0-rc4+ #2 PREEMPT 
[    5.547262] Tainted: [T]=RANDSTRUCT
[    5.550752] Hardware name: ...
[    5.557284] Workqueue: events_unbound deferred_probe_work_func
[    5.563128] Call trace:
[    5.565578]  show_stack+0x20/0x38 (C)
[    5.569251]  dump_stack_lvl+0x8c/0xd0
[    5.572919]  dump_stack+0x18/0x28
[    5.576239]  print_circular_bug+0x28c/0x370
[    5.580431]  check_noncircular+0x178/0x190
[    5.584537]  __lock_acquire+0x1338/0x1f50
[    5.588558]  lock_acquire+0x1c4/0x350
[    5.592231]  __mutex_lock+0xb8/0x7f0
[    5.595815]  mutex_lock_nested+0x2c/0x40
[    5.599750]  genpd_lock_mtx+0x20/0x38
[    5.603418]  genpd_runtime_resume+0x118/0x298
[    5.607786]  __rpm_callback+0x50/0x200
[    5.611543]  rpm_callback+0x7c/0x90
[    5.615041]  rpm_resume+0x534/0x718
[    5.618539]  __pm_runtime_resume+0x58/0xa8
[    5.622644]  clk_pm_runtime_get.part.0.isra.0+0x24/0x98
[    5.627876]  __clk_register+0x51c/0x970
[    5.631717]  devm_clk_hw_register+0x64/0xe8
[    5.635909]  imx8mp_hsio_blk_ctrl_probe+0xa0/0xf8
[    5.640622]  imx8mp_blk_ctrl_probe+0x358/0x568
[    5.645071]  platform_probe+0x64/0xa8
[    5.648743]  really_probe+0xc4/0x2b8
[    5.652326]  __driver_probe_device+0x80/0x140
[    5.656693]  driver_probe_device+0xe0/0x170
[    5.660886]  __device_attach_driver+0xc0/0x148
[    5.665339]  bus_for_each_drv+0x9c/0x108
[    5.669269]  __device_attach+0xa8/0x1a0
[    5.673115]  device_initial_probe+0x1c/0x30
[    5.677308]  bus_probe_device+0xb4/0xc0
[    5.681152]  deferred_probe_work_func+0x90/0xd8
[    5.685691]  process_one_work+0x214/0x618
[    5.689713]  worker_thread+0x1b4/0x368
[    5.693471]  kthread+0x150/0x238
[    5.696709]  ret_from_fork+0x10/0x20

You're welcome to ask for more info, even though I'm afraid I might be
unable to provide them given how rare this event is.

Best regards,
Luca

-- 
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ