linux-kernel - Re: [PATCH v3] workqueue: don't skip lockdep work dependency in cancel_work

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a50218b6-fc42-7f12-155a-5e01fc8dd1a0@roeck-us.net>
Date:   Wed, 13 Sep 2023 08:59:04 -0700
From:   Guenter Roeck <linux@...ck-us.net>
To:     Johannes Berg <johannes@...solutions.net>,
        Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc:     Lai Jiangshan <jiangshanlai@...il.com>, Tejun Heo <tj@...nel.org>,
        Hillf Danton <hdanton@...a.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Heyi Guo <guoheyi@...ux.alibaba.com>, netdev@...r.kernel.org
Subject: Re: [PATCH v3] workqueue: don't skip lockdep work dependency in
 cancel_work_sync()

On 9/13/23 07:41, Johannes Berg wrote:
> Hi Guenter,
> 
>> This patch results in the attached lockdep splat when running the
>> ast2600-evb emulation in qemu with aspeed_g5_defconfig and lock debugging
>> enabled. Reverting this patch fixes the problem.
> 
> Umm ... That's only true if you think the problem is the lockdep splat,
> rather than the actual potential deadlock?!
> 

It was hard for me to say because the workqueue lock doesn't exist
in the first place if lockdep debugging is not enabled.

>> [    9.809960] ======================================================
>> [    9.810053] WARNING: possible circular locking dependency detected
>> [    9.810196] 6.6.0-rc1-00004-g6faca50f629f #1 Tainted: G                 N
> 
> I don't have this exact tree, but on 6.6-rc1,
> 

Meh, I just included a couple of bug fixes not yet available in 6.6-rc1.

>> [    9.810327] ------------------------------------------------------
>> [    9.810406] ip/357 is trying to acquire lock:
>> [    9.810501] 83af6c40 ((work_completion)(&(&dev->state_queue)->work)){+.+.}-{0:0}, at: __flush_work+0x40/0x550
>> [    9.811052]
>> [    9.811052] but task is already holding lock:
>> [    9.811133] 81639924 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x124/0x514
>> [    9.811264]
>> [    9.811264] which lock already depends on the new lock.
>> [    9.811264]
>> [    9.811361]
>> [    9.811361] the existing dependency chain (in reverse order) is:
>> [    9.811466]
>> [    9.811466] -> #1 (rtnl_mutex){+.+.}-{3:3}:
>> [    9.811616]        lock_acquire+0xfc/0x368
>> [    9.811717]        __mutex_lock+0x90/0xf00
>> [    9.811782]        mutex_lock_nested+0x24/0x2c
>> [    9.811845]        ftgmac100_reset+0x1c/0x1dc
> 
> 
> This does indeed take the RTNL:
> 
> static void ftgmac100_reset(struct ftgmac100 *priv)
> {
>          struct net_device *netdev = priv->netdev;
>          int err;
> 
>          netdev_dbg(netdev, "Resetting NIC...\n");
> 
>          /* Lock the world */
>          rtnl_lock();
> 
> and is called from
> 
>> [    9.811907]        ftgmac100_adjust_link+0xc0/0x13c
>> [    9.811972]        phy_link_change+0x30/0x5c
>> [    9.812035]        phy_check_link_status+0x9c/0x11c
>> [    9.812100]        phy_state_machine+0x1c0/0x2c0
> 
> this work (phy_state_machine is the function), which
> 
>> [    9.812405] -> #0 ((work_completion)(&(&dev->state_queue)->work)){+.+.}-{0:0}:
>> [    9.812531]        check_prev_add+0x128/0x15ec
>> [    9.812594]        __lock_acquire+0x16ec/0x20cc
>> [    9.812656]        lock_acquire+0xfc/0x368
>> [    9.812712]        __flush_work+0x70/0x550
>> [    9.812769]        __cancel_work_timer+0x1e4/0x264
>> [    9.812833]        phy_stop+0x78/0x128
> 
> is cancelled by phy_stop() in phy_stop_machine():
> 
> void phy_stop_machine(struct phy_device *phydev)
> {
>          cancel_delayed_work_sync(&phydev->state_queue);
> 
> but of course that's called by the driver under RTNL:
> 
>> [    9.812889]        ftgmac100_stop+0x5c/0xac
>> [    9.812949]        __dev_close_many+0xb8/0x140
> 
> (__dev_close_many requires RTNL)
> 
> 
> So you have a potential deadlock in this driver. Yes, workqueue items
> and RTNL are basically incompatible. Don't do that. Now this bug was
> _probably_ added by commit 1baf2e50e48f ("drivers/net/ftgmac100: fix
> DHCP potential failure with systemd") which added a call to
> ftgmac100_reset() in ftgmac100_adjust_link() which is the thing called
> from the PHY state machine in the first place.
> 
> Should that be reverted? I don't know ... maybe it can be fixed
> differently.
> 
> 
> But anyway ... as far as lockdep/workqueue stuff is concerned I'd
> definitely call it a win rather than a bug! Yay for making lockdep
> useful - it found a deadlock situation for you! :-) No need to blame
> lockdep for that :P
> 

So you are saying that anything running in a workqueue must not
acquire rtnl_lock because cancel_[delayed_]work_sync() may be called
under rtnl_lock.

Fair point, but is that documented somewhere ? If not, how is anyone
supposed to know ? If it is not documented, I might we well argue that
cancel_[delayed_]work_sync() should not be called with rtnl_lock held
because some worker might hold that lock.

FWIW, it would be nice if the lockdep code would generate some other
message in this situation. Complaining about a deadlock involving a
lock that doesn't exist if lock debugging isn't enabled is not really
helpful and, yes, may result in reporters to falsely assume that this
lock is responsible for the potential deadlock.

Reverting 1baf2e50e48f does fix the problem as well.

Thanks,
Guenter