[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <da9ddec6-ab6d-4ab0-95a7-142af7f0786d@gmail.com>
Date: Sat, 25 Nov 2023 18:36:43 +0100
From: Heiner Kallweit <hkallweit1@...il.com>
To: Ian Chen <free122448@...mail.com>, netdev@...r.kernel.org
Subject: Re: [BUG] r8169: deadlock when NetworkManager brings link up
On 25.11.2023 14:55, Ian Chen wrote:
> Hello,
>
> My home server runs Arch Linux with its stock kernel on a GIGABYTE Z790
> AORUS ELITE AX with its builtin RTL8125B ethernet adapter.
>
> After upgrading from 6.6.1.arch1 to 6.6.2.arch1, booting up the system
> would end up in a state where all operations on any netlink socket
> would block forever. The system is effectively unusable. Here's the
> relevant dmesg:
>
> kernel: INFO: task kworker/u64:2:218 blocked for more than 122 seconds.
> kernel: Not tainted 6.6.2-arch1-1 #1
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: task:kworker/u64:2 state:D stack:0 pid:218 ppid:2
> flags:0x00004000
> kernel: Workqueue: events_power_efficient crda_timeout_work [cfg80211]
> kernel: Call Trace:
> kernel: <TASK>
> kernel: __schedule+0x3e8/0x1410
> kernel: schedule+0x5e/0xd0
> kernel: schedule_preempt_disabled+0x15/0x30
> kernel: __mutex_lock.constprop.0+0x39a/0x6a0
> kernel: crda_timeout_work+0x10/0x40 [cfg80211
> d1ff02bd631e7b94dc4a8630ea4cdb5aede1cb9b]
> kernel: process_one_work+0x171/0x340
> kernel: worker_thread+0x27b/0x3a0
> kernel: ? __pfx_worker_thread+0x10/0x10
> kernel: kthread+0xe5/0x120
> kernel: ? __pfx_kthread+0x10/0x10
> kernel: ret_from_fork+0x31/0x50
> kernel: ? __pfx_kthread+0x10/0x10
> kernel: ret_from_fork_asm+0x1b/0x30
> kernel: </TASK>
> kernel: INFO: task kworker/5:1:250 blocked for more than 122 seconds.
> kernel: Not tainted 6.6.2-arch1-1 #1
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: task:kworker/5:1 state:D stack:0 pid:250 ppid:2
> flags:0x00004000
> kernel: Workqueue: events linkwatch_event
> kernel: Call Trace:
> kernel: <TASK>
> kernel: __schedule+0x3e8/0x1410
> kernel: ? sched_clock+0x10/0x30
> kernel: schedule+0x5e/0xd0
> kernel: schedule_preempt_disabled+0x15/0x30
> kernel: __mutex_lock.constprop.0+0x39a/0x6a0
> kernel: linkwatch_event+0x12/0x40
> kernel: process_one_work+0x171/0x340
> kernel: worker_thread+0x27b/0x3a0
> kernel: ? __pfx_worker_thread+0x10/0x10
> kernel: kthread+0xe5/0x120
> kernel: ? __pfx_kthread+0x10/0x10
> kernel: ret_from_fork+0x31/0x50
> kernel: ? __pfx_kthread+0x10/0x10
> kernel: ret_from_fork_asm+0x1b/0x30
> kernel: </TASK>
> kernel: INFO: task kworker/u64:6:290 blocked for more than 122 seconds.
> kernel: Not tainted 6.6.2-arch1-1 #1
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: task:kworker/u64:6 state:D stack:0 pid:290 ppid:2
> flags:0x00004000
> kernel: Workqueue: netns cleanup_net
> kernel: Call Trace:
> kernel: <TASK>
> kernel: __schedule+0x3e8/0x1410
> kernel: schedule+0x5e/0xd0
> kernel: schedule_preempt_disabled+0x15/0x30
> kernel: __mutex_lock.constprop.0+0x39a/0x6a0
> kernel: wg_netns_pre_exit+0x19/0x100 [wireguard
> 0c090e6018e49e49957d27fd2202b1db304881dc]
> kernel: cleanup_net+0x1e0/0x3b0
> kernel: process_one_work+0x171/0x340
> kernel: worker_thread+0x27b/0x3a0
> kernel: ? __pfx_worker_thread+0x10/0x10
> kernel: kthread+0xe5/0x120
> kernel: ? __pfx_kthread+0x10/0x10
> kernel: ret_from_fork+0x31/0x50
> kernel: ? __pfx_kthread+0x10/0x10
> kernel: ret_from_fork_asm+0x1b/0x30
> kernel: </TASK>
> kernel: INFO: task kworker/u64:19:577 blocked for more than 122
> seconds.
> kernel: Not tainted 6.6.2-arch1-1 #1
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: task:kworker/u64:19 state:D stack:0 pid:577 ppid:2
> flags:0x00004000
> kernel: Workqueue: events_power_efficient reg_check_chans_work
> [cfg80211]
> kernel: Call Trace:
> kernel: <TASK>
> kernel: __schedule+0x3e8/0x1410
> kernel: ? _get_random_bytes+0xc0/0x1a0
> kernel: schedule+0x5e/0xd0
> kernel: schedule_preempt_disabled+0x15/0x30
> kernel: __mutex_lock.constprop.0+0x39a/0x6a0
> kernel: ? finish_task_switch.isra.0+0x94/0x2f0
> kernel: reg_check_chans_work+0x31/0x5b0 [cfg80211
> d1ff02bd631e7b94dc4a8630ea4cdb5aede1cb9b]
> kernel: process_one_work+0x171/0x340
> kernel: worker_thread+0x27b/0x3a0
> kernel: ? __pfx_worker_thread+0x10/0x10
> kernel: kthread+0xe5/0x120
> kernel: ? __pfx_kthread+0x10/0x10
> kernel: ret_from_fork+0x31/0x50
> kernel: ? __pfx_kthread+0x10/0x10
> kernel: ret_from_fork_asm+0x1b/0x30
> kernel: </TASK>
> kernel: INFO: task kworker/u64:23:581 blocked for more than 122
> seconds.
> kernel: Not tainted 6.6.2-arch1-1 #1
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: task:kworker/u64:23 state:D stack:0 pid:581 ppid:2
> flags:0x00004000
> kernel: Workqueue: events_power_efficient phy_state_machine [libphy]
> kernel: Call Trace:
> kernel: <TASK>
> kernel: __schedule+0x3e8/0x1410
> kernel: schedule+0x5e/0xd0
> kernel: schedule_preempt_disabled+0x15/0x30
> kernel: __mutex_lock.constprop.0+0x39a/0x6a0
> kernel: phy_state_machine+0x47/0x2c0 [libphy
> 93248cd1d88abf54f1b4cc64a990177f549a7710]
> kernel: process_one_work+0x171/0x340
> kernel: worker_thread+0x27b/0x3a0
> kernel: ? __pfx_worker_thread+0x10/0x10
> kernel: kthread+0xe5/0x120
> kernel: ? __pfx_kthread+0x10/0x10
> kernel: ret_from_fork+0x31/0x50
> kernel: ? __pfx_kthread+0x10/0x10
> kernel: ret_from_fork_asm+0x1b/0x30
> kernel: </TASK>
> kernel: INFO: task NetworkManager:849 blocked for more than 122
> seconds.
> kernel: Not tainted 6.6.2-arch1-1 #1
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: task:NetworkManager state:D stack:0 pid:849 ppid:1
> flags:0x00004002
> kernel: Call Trace:
> kernel: <TASK>
> kernel: __schedule+0x3e8/0x1410
> kernel: ? sysvec_apic_timer_interrupt+0xe/0x90
> kernel: schedule+0x5e/0xd0
> kernel: schedule_preempt_disabled+0x15/0x30
> kernel: __mutex_lock.constprop.0+0x39a/0x6a0
> kernel: ? pci_conf1_write+0xae/0xf0
> kernel: ? pcie_set_readrq+0x8e/0x160
> kernel: phy_start_aneg+0x1d/0x40 [libphy
> 93248cd1d88abf54f1b4cc64a990177f549a7710]
> kernel: rtl_reset_work+0x1bd/0x3b0 [r8169
> 08653ab60f23923c3943d53f140b2b697e265b93]
> kernel: r8169_phylink_handler+0x5b/0x240 [r8169
> 08653ab60f23923c3943d53f140b2b697e265b93]
> kernel: phy_link_change+0x2e/0x60 [libphy
> 93248cd1d88abf54f1b4cc64a990177f549a7710]
> kernel: phy_check_link_status+0xad/0xe0 [libphy
> 93248cd1d88abf54f1b4cc64a990177f549a7710]
> kernel: phy_start_aneg+0x25/0x40 [libphy
> 93248cd1d88abf54f1b4cc64a990177f549a7710]
> kernel: rtl8169_change_mtu+0x24/0x60 [r8169
> 08653ab60f23923c3943d53f140b2b697e265b93]
> kernel: dev_set_mtu_ext+0xf1/0x200
> kernel: ? select_task_rq_fair+0x82c/0x1dd0
> kernel: do_setlink+0x291/0x12d0
> kernel: ? remove_entity_load_avg+0x31/0x80
> kernel: ? sched_clock+0x10/0x30
> kernel: ? sched_clock_cpu+0xf/0x190
> kernel: ? __smp_call_single_queue+0xad/0x120
> kernel: ? ttwu_queue_wakelist+0xef/0x110
> kernel: ? __nla_validate_parse+0x61/0xd10
> kernel: ? try_to_wake_up+0x2b7/0x640
> kernel: __rtnl_newlink+0x651/0xa10
> kernel: ? __kmem_cache_alloc_node+0x1a6/0x340
> kernel: ? rtnl_newlink+0x2e/0x70
> kernel: rtnl_newlink+0x47/0x70
> kernel: rtnetlink_rcv_msg+0x14f/0x3c0
> kernel: ? number+0x33b/0x3d0
> kernel: ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> kernel: netlink_rcv_skb+0x58/0x110
> kernel: netlink_unicast+0x1a3/0x290
> kernel: netlink_sendmsg+0x254/0x4d0
> kernel: ____sys_sendmsg+0x396/0x3d0
> kernel: ? copy_msghdr_from_user+0x7d/0xc0
> kernel: ___sys_sendmsg+0x9a/0xe0
> kernel: __sys_sendmsg+0x7a/0xd0
> kernel: do_syscall_64+0x5d/0x90
> kernel: ? do_syscall_64+0x6c/0x90
> kernel: ? do_syscall_64+0x6c/0x90
> kernel: ? do_syscall_64+0x6c/0x90
> kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> kernel: RIP: 0033:0x7fc9232e7b3d
> kernel: RSP: 002b:00007fffd4df2830 EFLAGS: 00000293 ORIG_RAX:
> 000000000000002e
> kernel: RAX: ffffffffffffffda RBX: 0000000000000055 RCX:
> 00007fc9232e7b3d
> kernel: RDX: 0000000000000000 RSI: 00007fffd4df2870 RDI:
> 000000000000000d
> kernel: RBP: 00007fffd4df2c40 R08: 0000000000000000 R09:
> 0000000000000000
> kernel: R10: 0000000000000000 R11: 0000000000000293 R12:
> 0000563fe71367c0
> kernel: R13: 0000000000000001 R14: 0000000000000000 R15:
> 0000000000000000
> kernel: </TASK>
> kernel: INFO: task geoclue:1358 blocked for more than 122 seconds.
> kernel: Not tainted 6.6.2-arch1-1 #1
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: task:geoclue state:D stack:0 pid:1358 ppid:1
> flags:0x00000002
> kernel: Call Trace:
> kernel: <TASK>
> kernel: __schedule+0x3e8/0x1410
> kernel: schedule+0x5e/0xd0
> kernel: schedule_preempt_disabled+0x15/0x30
> kernel: __mutex_lock.constprop.0+0x39a/0x6a0
> kernel: __netlink_dump_start+0x75/0x290
> kernel: ? __pfx_rtnl_dump_all+0x10/0x10
> kernel: rtnetlink_rcv_msg+0x277/0x3c0
> kernel: ? __pfx_rtnl_dump_all+0x10/0x10
> kernel: ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> kernel: netlink_rcv_skb+0x58/0x110
> kernel: netlink_unicast+0x1a3/0x290
> kernel: netlink_sendmsg+0x254/0x4d0
> kernel: __sys_sendto+0x1f6/0x200
> kernel: __x64_sys_sendto+0x24/0x30
> kernel: do_syscall_64+0x5d/0x90
> kernel: ? do_syscall_64+0x6c/0x90
> kernel: ? do_syscall_64+0x6c/0x90
> kernel: ? syscall_exit_to_user_mode+0x2b/0x40
> kernel: ? do_syscall_64+0x6c/0x90
> kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> kernel: RIP: 0033:0x7f977ae729ec
> kernel: RSP: 002b:00007ffeeb6aba50 EFLAGS: 00000246 ORIG_RAX:
> 000000000000002c
> kernel: RAX: ffffffffffffffda RBX: 000056084849e910 RCX:
> 00007f977ae729ec
> kernel: RDX: 0000000000000014 RSI: 00007ffeeb6abad0 RDI:
> 0000000000000007
> kernel: RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000000
> kernel: R10: 0000000000004000 R11: 0000000000000246 R12:
> 0000000000000014
> kernel: R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> kernel: </TASK>
> kernel: INFO: task pool-gnome-shel:1986 blocked for more than 122
> seconds.
> kernel: Not tainted 6.6.2-arch1-1 #1
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: task:pool-gnome-shel state:D stack:0 pid:1986 ppid:1513
> flags:0x00000002
> kernel: Call Trace:
> kernel: <TASK>
> kernel: __schedule+0x3e8/0x1410
> kernel: schedule+0x5e/0xd0
> kernel: schedule_preempt_disabled+0x15/0x30
> kernel: __mutex_lock.constprop.0+0x39a/0x6a0
> kernel: __netlink_dump_start+0x75/0x290
> kernel: ? __pfx_rtnl_dump_all+0x10/0x10
> kernel: rtnetlink_rcv_msg+0x277/0x3c0
> kernel: ? __pfx_rtnl_dump_all+0x10/0x10
> kernel: ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> kernel: netlink_rcv_skb+0x58/0x110
> kernel: netlink_unicast+0x1a3/0x290
> kernel: netlink_sendmsg+0x254/0x4d0
> kernel: __sys_sendto+0x1f6/0x200
> kernel: __x64_sys_sendto+0x24/0x30
> kernel: do_syscall_64+0x5d/0x90
> kernel: ? syscall_exit_to_user_mode+0x2b/0x40
> kernel: ? do_syscall_64+0x6c/0x90
> kernel: ? exc_page_fault+0x7f/0x180
> kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> kernel: RIP: 0033:0x7f232af30bfc
> kernel: RSP: 002b:00007f223e1fbba0 EFLAGS: 00000293 ORIG_RAX:
> 000000000000002c
> kernel: RAX: ffffffffffffffda RBX: 00007f223e1fccc0 RCX:
> 00007f232af30bfc
> kernel: RDX: 0000000000000014 RSI: 00007f223e1fccc0 RDI:
> 0000000000000028
> kernel: RBP: 0000000000000000 R08: 00007f223e1fcc64 R09:
> 000000000000000c
> kernel: R10: 0000000000000000 R11: 0000000000000293 R12:
> 0000000000000028
> kernel: R13: 00007f223e1fcc80 R14: 0000000000000665 R15:
> 000055638262fd10
> kernel: </TASK>
> kernel: INFO: task evolution-sourc:1819 blocked for more than 122
> seconds.
> kernel: Not tainted 6.6.2-arch1-1 #1
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: task:evolution-sourc state:D stack:0 pid:1819 ppid:1513
> flags:0x00000006
> kernel: Call Trace:
> kernel: <TASK>
> kernel: __schedule+0x3e8/0x1410
> kernel: schedule+0x5e/0xd0
> kernel: schedule_preempt_disabled+0x15/0x30
> kernel: __mutex_lock.constprop.0+0x39a/0x6a0
> kernel: ? netlink_lookup+0x151/0x1d0
> kernel: __netlink_dump_start+0x75/0x290
> kernel: ? __pfx_rtnl_dump_all+0x10/0x10
> kernel: rtnetlink_rcv_msg+0x277/0x3c0
> kernel: ? __pfx_rtnl_dump_all+0x10/0x10
> kernel: ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> kernel: netlink_rcv_skb+0x58/0x110
> kernel: netlink_unicast+0x1a3/0x290
> kernel: netlink_sendmsg+0x254/0x4d0
> kernel: __sys_sendto+0x1f6/0x200
> kernel: __x64_sys_sendto+0x24/0x30
> kernel: do_syscall_64+0x5d/0x90
> kernel: ? do_syscall_64+0x6c/0x90
> kernel: ? sock_getsockopt+0x22/0x30
> kernel: ? __fget_light+0x99/0x100
> kernel: ? __sys_setsockopt+0x129/0x1d0
> kernel: ? syscall_exit_to_user_mode+0x2b/0x40
> kernel: ? do_syscall_64+0x6c/0x90
> kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> kernel: RIP: 0033:0x7f6aa096c9ec
> kernel: RSP: 002b:00007fff2b442820 EFLAGS: 00000246 ORIG_RAX:
> 000000000000002c
> kernel: RAX: ffffffffffffffda RBX: 0000561e6b466d80 RCX:
> 00007f6aa096c9ec
> kernel: RDX: 0000000000000014 RSI: 00007fff2b4428a0 RDI:
> 000000000000000a
> kernel: RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000000
> kernel: R10: 0000000000004000 R11: 0000000000000246 R12:
> 0000000000000014
> kernel: R13: 00007fff2b442a70 R14: 0000000000000000 R15:
> 0000000000000001
> kernel: </TASK>
> kernel: INFO: task gnome-software:1904 blocked for more than 122
> seconds.
> kernel: Not tainted 6.6.2-arch1-1 #1
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: task:gnome-software state:D stack:0 pid:1904 ppid:1613
> flags:0x00000002
> kernel: Call Trace:
> kernel: <TASK>
> kernel: __schedule+0x3e8/0x1410
> kernel: ? __pte_offset_map_lock+0x9e/0x110
> kernel: schedule+0x5e/0xd0
> kernel: schedule_preempt_disabled+0x15/0x30
> kernel: __mutex_lock.constprop.0+0x39a/0x6a0
> kernel: ? netlink_lookup+0x151/0x1d0
> kernel: __netlink_dump_start+0x75/0x290
> kernel: ? __pfx_rtnl_dump_all+0x10/0x10
> kernel: rtnetlink_rcv_msg+0x277/0x3c0
> kernel: ? __pfx_rtnl_dump_all+0x10/0x10
> kernel: ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> kernel: netlink_rcv_skb+0x58/0x110
> kernel: netlink_unicast+0x1a3/0x290
> kernel: netlink_sendmsg+0x254/0x4d0
> kernel: __sys_sendto+0x1f6/0x200
> kernel: __x64_sys_sendto+0x24/0x30
> kernel: do_syscall_64+0x5d/0x90
> kernel: ? __fget_light+0x99/0x100
> kernel: ? __sys_setsockopt+0x129/0x1d0
> kernel: ? syscall_exit_to_user_mode+0x2b/0x40
> kernel: ? do_syscall_64+0x6c/0x90
> kernel: ? syscall_exit_to_user_mode+0x2b/0x40
> kernel: ? do_syscall_64+0x6c/0x90
> kernel: ? exc_page_fault+0x7f/0x180
> kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> kernel: RIP: 0033:0x7fdbfd26d9ec
> kernel: RSP: 002b:00007ffd15dd63e0 EFLAGS: 00000246 ORIG_RAX:
> 000000000000002c
> kernel: RAX: ffffffffffffffda RBX: 000056133c78f580 RCX:
> 00007fdbfd26d9ec
> kernel: RDX: 0000000000000014 RSI: 00007ffd15dd6460 RDI:
> 000000000000000b
> kernel: RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000000
> kernel: R10: 0000000000004000 R11: 0000000000000246 R12:
> 0000000000000014
> kernel: R13: 00007ffd15dd6630 R14: 0000000000000000 R15:
> 0000000000000001
> kernel: </TASK>
> kernel: Future hung task reports are suppressed, see sysctl
> kernel.hung_task_warnings
>
> From the call traces, it seems that the issue is caused by commit
> 621735f590643e3048ca2060c285b80551660601 (r8169: fix rare issue with
> broken rx after link-down on RTL8125), which got backported to 6.6.2.
>
> Ian
Could you please test whether the following fixes the issue for you?
---
drivers/net/ethernet/realtek/r8169_main.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 0aed99a20..e32cc3279 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -575,6 +575,7 @@ struct rtl8169_tc_offsets {
enum rtl_flag {
RTL_FLAG_TASK_ENABLED = 0,
RTL_FLAG_TASK_RESET_PENDING,
+ RTL_FLAG_TASK_RESET_NO_QUEUE_WAKE,
RTL_FLAG_TASK_TX_TIMEOUT,
RTL_FLAG_MAX
};
@@ -4494,6 +4495,8 @@ static void rtl_task(struct work_struct *work)
reset:
rtl_reset_work(tp);
netif_wake_queue(tp->dev);
+ } else if (test_and_clear_bit(RTL_FLAG_TASK_RESET_NO_QUEUE_WAKE, tp->wk.flags)) {
+ rtl_reset_work(tp);
}
out_unlock:
rtnl_unlock();
@@ -4527,7 +4530,7 @@ static void r8169_phylink_handler(struct net_device *ndev)
} else {
/* In few cases rx is broken after link-down otherwise */
if (rtl_is_8125(tp))
- rtl_reset_work(tp);
+ rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_NO_QUEUE_WAKE);
pm_runtime_idle(d);
}
@@ -4603,7 +4606,7 @@ static int rtl8169_close(struct net_device *dev)
rtl8169_down(tp);
rtl8169_rx_clear(tp);
- cancel_work_sync(&tp->wk.work);
+ cancel_work(&tp->wk.work);
free_irq(tp->irq, tp);
--
2.43.0
Powered by blists - more mailing lists