netdev - Re: [Issue report] drivers/ftgmac100: DHCP occasionally fails during boot up or link down/up

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1a7e74b4-8827-c14b-7371-9656a643d03c@linux.alibaba.com>
Date:   Sun, 20 Feb 2022 20:26:18 +0800
From:   Heyi Guo <guoheyi@...ux.alibaba.com>
To:     Andrew Lunn <andrew@...n.ch>
Cc:     "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Joel Stanley <joel@....id.au>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Dylan Hung <dylan_hung@...eedtech.com>, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [Issue report] drivers/ftgmac100: DHCP occasionally fails during
 boot up or link down/up

Hi Andrew,

There is indeed a dead lock warning after enabling prove_locking:


[   16.852199] ======================================================
[   16.859102] WARNING: possible circular locking dependency detected
[   16.866012] 5.10.36-60b3c9d-dirty-15f4fba #1 Not tainted
[   16.871976] ------------------------------------------------------
[   16.871991] kworker/1:1/23 is trying to acquire lock:
[   16.872000] 80fa0920 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x24/0x28
[   16.872047]
[   16.872047] but task is already holding lock:
[   16.872051] 821d44c0 (&dev->lock){+.+.}-{3:3}, at: 
phy_state_machine+0x50/0x290
[   16.872076]
[   16.872076] which lock already depends on the new lock.
[   16.872076]
[   16.872080]
[   16.872080] the existing dependency chain (in reverse order) is:
[   16.872083]
[   16.872083] -> #1 (&dev->lock){+.+.}-{3:3}:
[   16.872106]        lock_acquire+0x6c/0x74
[   16.872117]        __mutex_lock+0xb4/0xa48
[   16.872132]        mutex_lock_nested+0x2c/0x34
[   16.872141]        phy_start+0x30/0xc4
[   16.872155]        ftgmac100_open+0x1a0/0x254
[   16.872168]        __dev_open+0x110/0x1d0
[   16.872180]        __dev_change_flags+0x1d0/0x258
[   16.872192]        dev_change_flags+0x28/0x58
[   16.872204]        do_setlink+0x258/0xc60
[   16.872212]        rtnl_setlink+0x110/0x18c
[   16.872219]        rtnetlink_rcv_msg+0x1d0/0x53c
[   16.872226]        netlink_rcv_skb+0xd0/0x128
[   16.872233]        rtnetlink_rcv+0x20/0x24
[   16.872244]        netlink_unicast+0x1a8/0x26c
[   16.872254]        netlink_sendmsg+0x220/0x464
[   16.872265]        __sys_sendto+0xe4/0x134
[   16.872276]        sys_sendto+0x24/0x2c
[   16.872288]        ret_fast_syscall+0x0/0x28
[   16.872297]        0x7ed9e928
[   16.872301]
[   16.872301] -> #0 (rtnl_mutex){+.+.}-{3:3}:
[   16.872325]        __lock_acquire+0x17e8/0x3268
[   16.872331]        lock_acquire.part.0+0xcc/0x394
[   16.872341]        lock_acquire+0x6c/0x74
[   16.872354]        __mutex_lock+0xb4/0xa48
[   16.872365]        mutex_lock_nested+0x2c/0x34
[   16.872377]        rtnl_lock+0x24/0x28
[   16.872389]        ftgmac100_adjust_link+0xc0/0x144
[   16.872401]        phy_link_change+0x38/0x64
[   16.872411]        phy_check_link_status+0xa8/0xfc
[   16.872422]        phy_state_machine+0x80/0x290
[   16.872435]        process_one_work+0x294/0x7d8
[   16.872447]        worker_thread+0x6c/0x548
[   16.872456]        kthread+0x170/0x178
[   16.872462]        ret_from_fork+0x14/0x20
[   16.872467]        0x0
[   16.872471]
[   16.872471] other info that might help us debug this:
[   16.872471]
[   16.872475]  Possible unsafe locking scenario:
[   16.872475]
[   16.872478]        CPU0                    CPU1
[   16.872482]        ----                    ----
[   16.872485]   lock(&dev->lock);
[   16.872495]                                lock(rtnl_mutex);
[   16.872505] lock(&dev->lock);
[   16.872513]   lock(rtnl_mutex);
[   16.872522]
[   16.872522]  *** DEADLOCK ***
[   16.872522]
[   16.872528] 3 locks held by kworker/1:1/23:
[   16.872532]  #0: 818472a8 
((wq_completion)events_power_efficient){+.+.}-{0:0}, at: 
process_one_work+0x1e8/0x7d8
[   16.872558]  #1: 819fbef8 
((work_completion)(&(&dev->state_queue)->work)){+.+.}-{0:0}, at: 
process_one_work+0x1e8/0x7d8
[   16.872582]  #2: 821d44c0 (&dev->lock){+.+.}-{3:3}, at: 
phy_state_machine+0x50/0x290

Any advice to get around of it?

Thanks,

Heyi

在 2022/2/20 上午2:28, Andrew Lunn 写道:
> On Sat, Feb 19, 2022 at 06:08:35PM +0800, Heyi Guo wrote:
>> Hi Andrew,
>>
>> The DHCP issue is gone after applying below patch. I put the lock statements
>> outside of the pure reset function, for the phydev lock has been acquired
>> before calling adjust_link. The lock order in ftgmac100_reset_task() was
>> also changed, to make it the same as the lock procedure in adjust_link, in
>> which the phydev is locked first and then rtnl_lock. I'm not quite sure
>> whether it will bring in any potential dead lock. Any advice?
> Did you run the code with CONFIG_PROVE_LOCKING enabled. That will help
> detect possible deadlock situations.
>
>         Andrew