linux-kernel - Re: [PATCH v3] workqueue: don't skip lockdep work dependency in cancel_work

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ab6b6caf-6e6b-e4d7-f1e9-b178115c7fc5@roeck-us.net>
Date:   Wed, 13 Sep 2023 10:45:18 -0700
From:   Guenter Roeck <linux@...ck-us.net>
To:     Florian Fainelli <f.fainelli@...il.com>,
        Johannes Berg <johannes@...solutions.net>,
        Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        Heyi Guo <guoheyi@...ux.alibaba.com>
Cc:     Lai Jiangshan <jiangshanlai@...il.com>, Tejun Heo <tj@...nel.org>,
        Hillf Danton <hdanton@...a.com>,
        LKML <linux-kernel@...r.kernel.org>, netdev@...r.kernel.org
Subject: Re: [PATCH v3] workqueue: don't skip lockdep work dependency in
 cancel_work_sync()

On 9/13/23 10:25, Florian Fainelli wrote:
> On 9/13/23 08:59, Guenter Roeck wrote:
>> On 9/13/23 07:41, Johannes Berg wrote:
>>> Hi Guenter,
>>>
>>>> This patch results in the attached lockdep splat when running the
>>>> ast2600-evb emulation in qemu with aspeed_g5_defconfig and lock debugging
>>>> enabled. Reverting this patch fixes the problem.
>>>
>>> Umm ... That's only true if you think the problem is the lockdep splat,
>>> rather than the actual potential deadlock?!
>>>
>>
>> It was hard for me to say because the workqueue lock doesn't exist
>> in the first place if lockdep debugging is not enabled.
>>
>>>> [    9.809960] ======================================================
>>>> [    9.810053] WARNING: possible circular locking dependency detected
>>>> [    9.810196] 6.6.0-rc1-00004-g6faca50f629f #1 Tainted: G                 N
>>>
>>> I don't have this exact tree, but on 6.6-rc1,
>>>
>>
>> Meh, I just included a couple of bug fixes not yet available in 6.6-rc1.
>>
>>>> [    9.810327] ------------------------------------------------------
>>>> [    9.810406] ip/357 is trying to acquire lock:
>>>> [    9.810501] 83af6c40 ((work_completion)(&(&dev->state_queue)->work)){+.+.}-{0:0}, at: __flush_work+0x40/0x550
>>>> [    9.811052]
>>>> [    9.811052] but task is already holding lock:
>>>> [    9.811133] 81639924 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x124/0x514
>>>> [    9.811264]
>>>> [    9.811264] which lock already depends on the new lock.
>>>> [    9.811264]
>>>> [    9.811361]
>>>> [    9.811361] the existing dependency chain (in reverse order) is:
>>>> [    9.811466]
>>>> [    9.811466] -> #1 (rtnl_mutex){+.+.}-{3:3}:
>>>> [    9.811616]        lock_acquire+0xfc/0x368
>>>> [    9.811717]        __mutex_lock+0x90/0xf00
>>>> [    9.811782]        mutex_lock_nested+0x24/0x2c
>>>> [    9.811845]        ftgmac100_reset+0x1c/0x1dc
>>>
>>>
>>> This does indeed take the RTNL:
>>>
>>> static void ftgmac100_reset(struct ftgmac100 *priv)
>>> {
>>>          struct net_device *netdev = priv->netdev;
>>>          int err;
>>>
>>>          netdev_dbg(netdev, "Resetting NIC...\n");
>>>
>>>          /* Lock the world */
>>>          rtnl_lock();
>>>
>>> and is called from
>>>
>>>> [    9.811907]        ftgmac100_adjust_link+0xc0/0x13c
>>>> [    9.811972]        phy_link_change+0x30/0x5c
>>>> [    9.812035]        phy_check_link_status+0x9c/0x11c
>>>> [    9.812100]        phy_state_machine+0x1c0/0x2c0
>>>
>>> this work (phy_state_machine is the function), which
>>>
>>>> [    9.812405] -> #0 ((work_completion)(&(&dev->state_queue)->work)){+.+.}-{0:0}:
>>>> [    9.812531]        check_prev_add+0x128/0x15ec
>>>> [    9.812594]        __lock_acquire+0x16ec/0x20cc
>>>> [    9.812656]        lock_acquire+0xfc/0x368
>>>> [    9.812712]        __flush_work+0x70/0x550
>>>> [    9.812769]        __cancel_work_timer+0x1e4/0x264
>>>> [    9.812833]        phy_stop+0x78/0x128
>>>
>>> is cancelled by phy_stop() in phy_stop_machine():
>>>
>>> void phy_stop_machine(struct phy_device *phydev)
>>> {
>>>          cancel_delayed_work_sync(&phydev->state_queue);
>>>
>>> but of course that's called by the driver under RTNL:
>>>
>>>> [    9.812889]        ftgmac100_stop+0x5c/0xac
>>>> [    9.812949]        __dev_close_many+0xb8/0x140
>>>
>>> (__dev_close_many requires RTNL)
>>>
>>>
>>> So you have a potential deadlock in this driver. Yes, workqueue items
>>> and RTNL are basically incompatible. Don't do that. Now this bug was
>>> _probably_ added by commit 1baf2e50e48f ("drivers/net/ftgmac100: fix
>>> DHCP potential failure with systemd") which added a call to
>>> ftgmac100_reset() in ftgmac100_adjust_link() which is the thing called
>>> from the PHY state machine in the first place.
>>>
>>> Should that be reverted? I don't know ... maybe it can be fixed
>>> differently.
>>>
>>>
>>> But anyway ... as far as lockdep/workqueue stuff is concerned I'd
>>> definitely call it a win rather than a bug! Yay for making lockdep
>>> useful - it found a deadlock situation for you! :-) No need to blame
>>> lockdep for that :P
>>>
>>
>> So you are saying that anything running in a workqueue must not
>> acquire rtnl_lock because cancel_[delayed_]work_sync() may be called
>> under rtnl_lock.
>>
>> Fair point, but is that documented somewhere ? If not, how is anyone
>> supposed to know ? If it is not documented, I might we well argue that
>> cancel_[delayed_]work_sync() should not be called with rtnl_lock held
>> because some worker might hold that lock.
>>
>> FWIW, it would be nice if the lockdep code would generate some other
>> message in this situation. Complaining about a deadlock involving a
>> lock that doesn't exist if lock debugging isn't enabled is not really
>> helpful and, yes, may result in reporters to falsely assume that this
>> lock is responsible for the potential deadlock.
>>
>> Reverting 1baf2e50e48f does fix the problem as well.
> 
> I would refrain from reverting without giving a fighting chance to the author to address it. It seems a bit strange that we do this locking dance while it might have been simpler to introduce a ftgmac100_reset_unlocked() and ftgmac100_reset() and call both at the intended places, something like the completely untested patch attached maybe?


I agree. As it turns out, there are lots of "workqueue items" in the
kernel calling rtnl_lock(), and at least some of them are canceled
with cancel_delayed_work_sync(). So there has to be some additional
qualifying factor, such as that it is safe to acquire rtnl_lock() in a
worker as long as its cancel function is not called with the lock held.

I don't know if the attached patch would work because I don't know the
impact of executing ftgmac100_reset() without holding the locks.
Also, after all, the problem isnt't the lock itself, but the fact that
cancel_delayed_work_sync() may be called on this specific worker with
rtnl_lock held.

Guenter