lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sat, 25 Nov 2023 21:55:00 +0800
From: Ian Chen <free122448@...mail.com>
To: netdev@...r.kernel.org
Cc: Heiner Kallweit <hkallweit1@...il.com>
Subject: [BUG] r8169: deadlock when NetworkManager brings link up

Hello,

My home server runs Arch Linux with its stock kernel on a GIGABYTE Z790
AORUS ELITE AX with its builtin RTL8125B ethernet adapter.

After upgrading from 6.6.1.arch1 to 6.6.2.arch1, booting up the system
would end up in a state where all operations on any netlink socket
would block forever. The system is effectively unusable. Here's the
relevant dmesg:

kernel: INFO: task kworker/u64:2:218 blocked for more than 122 seconds.
kernel:       Not tainted 6.6.2-arch1-1 #1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
kernel: task:kworker/u64:2   state:D stack:0     pid:218   ppid:2     
flags:0x00004000
kernel: Workqueue: events_power_efficient crda_timeout_work [cfg80211]
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x3e8/0x1410
kernel:  schedule+0x5e/0xd0
kernel:  schedule_preempt_disabled+0x15/0x30
kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
kernel:  crda_timeout_work+0x10/0x40 [cfg80211
d1ff02bd631e7b94dc4a8630ea4cdb5aede1cb9b]
kernel:  process_one_work+0x171/0x340
kernel:  worker_thread+0x27b/0x3a0
kernel:  ? __pfx_worker_thread+0x10/0x10
kernel:  kthread+0xe5/0x120
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork+0x31/0x50
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork_asm+0x1b/0x30
kernel:  </TASK>
kernel: INFO: task kworker/5:1:250 blocked for more than 122 seconds.
kernel:       Not tainted 6.6.2-arch1-1 #1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
kernel: task:kworker/5:1     state:D stack:0     pid:250   ppid:2     
flags:0x00004000
kernel: Workqueue: events linkwatch_event
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x3e8/0x1410
kernel:  ? sched_clock+0x10/0x30
kernel:  schedule+0x5e/0xd0
kernel:  schedule_preempt_disabled+0x15/0x30
kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
kernel:  linkwatch_event+0x12/0x40
kernel:  process_one_work+0x171/0x340
kernel:  worker_thread+0x27b/0x3a0
kernel:  ? __pfx_worker_thread+0x10/0x10
kernel:  kthread+0xe5/0x120
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork+0x31/0x50
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork_asm+0x1b/0x30
kernel:  </TASK>
kernel: INFO: task kworker/u64:6:290 blocked for more than 122 seconds.
kernel:       Not tainted 6.6.2-arch1-1 #1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
kernel: task:kworker/u64:6   state:D stack:0     pid:290   ppid:2     
flags:0x00004000
kernel: Workqueue: netns cleanup_net
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x3e8/0x1410
kernel:  schedule+0x5e/0xd0
kernel:  schedule_preempt_disabled+0x15/0x30
kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
kernel:  wg_netns_pre_exit+0x19/0x100 [wireguard
0c090e6018e49e49957d27fd2202b1db304881dc]
kernel:  cleanup_net+0x1e0/0x3b0
kernel:  process_one_work+0x171/0x340
kernel:  worker_thread+0x27b/0x3a0
kernel:  ? __pfx_worker_thread+0x10/0x10
kernel:  kthread+0xe5/0x120
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork+0x31/0x50
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork_asm+0x1b/0x30
kernel:  </TASK>
kernel: INFO: task kworker/u64:19:577 blocked for more than 122
seconds.
kernel:       Not tainted 6.6.2-arch1-1 #1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
kernel: task:kworker/u64:19  state:D stack:0     pid:577   ppid:2     
flags:0x00004000
kernel: Workqueue: events_power_efficient reg_check_chans_work
[cfg80211]
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x3e8/0x1410
kernel:  ? _get_random_bytes+0xc0/0x1a0
kernel:  schedule+0x5e/0xd0
kernel:  schedule_preempt_disabled+0x15/0x30
kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
kernel:  ? finish_task_switch.isra.0+0x94/0x2f0
kernel:  reg_check_chans_work+0x31/0x5b0 [cfg80211
d1ff02bd631e7b94dc4a8630ea4cdb5aede1cb9b]
kernel:  process_one_work+0x171/0x340
kernel:  worker_thread+0x27b/0x3a0
kernel:  ? __pfx_worker_thread+0x10/0x10
kernel:  kthread+0xe5/0x120
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork+0x31/0x50
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork_asm+0x1b/0x30
kernel:  </TASK>
kernel: INFO: task kworker/u64:23:581 blocked for more than 122
seconds.
kernel:       Not tainted 6.6.2-arch1-1 #1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
kernel: task:kworker/u64:23  state:D stack:0     pid:581   ppid:2     
flags:0x00004000
kernel: Workqueue: events_power_efficient phy_state_machine [libphy]
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x3e8/0x1410
kernel:  schedule+0x5e/0xd0
kernel:  schedule_preempt_disabled+0x15/0x30
kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
kernel:  phy_state_machine+0x47/0x2c0 [libphy
93248cd1d88abf54f1b4cc64a990177f549a7710]
kernel:  process_one_work+0x171/0x340
kernel:  worker_thread+0x27b/0x3a0
kernel:  ? __pfx_worker_thread+0x10/0x10
kernel:  kthread+0xe5/0x120
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork+0x31/0x50
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork_asm+0x1b/0x30
kernel:  </TASK>
kernel: INFO: task NetworkManager:849 blocked for more than 122
seconds.
kernel:       Not tainted 6.6.2-arch1-1 #1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
kernel: task:NetworkManager  state:D stack:0     pid:849   ppid:1     
flags:0x00004002
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x3e8/0x1410
kernel:  ? sysvec_apic_timer_interrupt+0xe/0x90
kernel:  schedule+0x5e/0xd0
kernel:  schedule_preempt_disabled+0x15/0x30
kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
kernel:  ? pci_conf1_write+0xae/0xf0
kernel:  ? pcie_set_readrq+0x8e/0x160
kernel:  phy_start_aneg+0x1d/0x40 [libphy
93248cd1d88abf54f1b4cc64a990177f549a7710]
kernel:  rtl_reset_work+0x1bd/0x3b0 [r8169
08653ab60f23923c3943d53f140b2b697e265b93]
kernel:  r8169_phylink_handler+0x5b/0x240 [r8169
08653ab60f23923c3943d53f140b2b697e265b93]
kernel:  phy_link_change+0x2e/0x60 [libphy
93248cd1d88abf54f1b4cc64a990177f549a7710]
kernel:  phy_check_link_status+0xad/0xe0 [libphy
93248cd1d88abf54f1b4cc64a990177f549a7710]
kernel:  phy_start_aneg+0x25/0x40 [libphy
93248cd1d88abf54f1b4cc64a990177f549a7710]
kernel:  rtl8169_change_mtu+0x24/0x60 [r8169
08653ab60f23923c3943d53f140b2b697e265b93]
kernel:  dev_set_mtu_ext+0xf1/0x200
kernel:  ? select_task_rq_fair+0x82c/0x1dd0
kernel:  do_setlink+0x291/0x12d0
kernel:  ? remove_entity_load_avg+0x31/0x80
kernel:  ? sched_clock+0x10/0x30
kernel:  ? sched_clock_cpu+0xf/0x190
kernel:  ? __smp_call_single_queue+0xad/0x120
kernel:  ? ttwu_queue_wakelist+0xef/0x110
kernel:  ? __nla_validate_parse+0x61/0xd10
kernel:  ? try_to_wake_up+0x2b7/0x640
kernel:  __rtnl_newlink+0x651/0xa10
kernel:  ? __kmem_cache_alloc_node+0x1a6/0x340
kernel:  ? rtnl_newlink+0x2e/0x70
kernel:  rtnl_newlink+0x47/0x70
kernel:  rtnetlink_rcv_msg+0x14f/0x3c0
kernel:  ? number+0x33b/0x3d0
kernel:  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
kernel:  netlink_rcv_skb+0x58/0x110
kernel:  netlink_unicast+0x1a3/0x290
kernel:  netlink_sendmsg+0x254/0x4d0
kernel:  ____sys_sendmsg+0x396/0x3d0
kernel:  ? copy_msghdr_from_user+0x7d/0xc0
kernel:  ___sys_sendmsg+0x9a/0xe0
kernel:  __sys_sendmsg+0x7a/0xd0
kernel:  do_syscall_64+0x5d/0x90
kernel:  ? do_syscall_64+0x6c/0x90
kernel:  ? do_syscall_64+0x6c/0x90
kernel:  ? do_syscall_64+0x6c/0x90
kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
kernel: RIP: 0033:0x7fc9232e7b3d
kernel: RSP: 002b:00007fffd4df2830 EFLAGS: 00000293 ORIG_RAX:
000000000000002e
kernel: RAX: ffffffffffffffda RBX: 0000000000000055 RCX:
00007fc9232e7b3d
kernel: RDX: 0000000000000000 RSI: 00007fffd4df2870 RDI:
000000000000000d
kernel: RBP: 00007fffd4df2c40 R08: 0000000000000000 R09:
0000000000000000
kernel: R10: 0000000000000000 R11: 0000000000000293 R12:
0000563fe71367c0
kernel: R13: 0000000000000001 R14: 0000000000000000 R15:
0000000000000000
kernel:  </TASK>
kernel: INFO: task geoclue:1358 blocked for more than 122 seconds.
kernel:       Not tainted 6.6.2-arch1-1 #1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
kernel: task:geoclue         state:D stack:0     pid:1358  ppid:1     
flags:0x00000002
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x3e8/0x1410
kernel:  schedule+0x5e/0xd0
kernel:  schedule_preempt_disabled+0x15/0x30
kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
kernel:  __netlink_dump_start+0x75/0x290
kernel:  ? __pfx_rtnl_dump_all+0x10/0x10
kernel:  rtnetlink_rcv_msg+0x277/0x3c0
kernel:  ? __pfx_rtnl_dump_all+0x10/0x10
kernel:  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
kernel:  netlink_rcv_skb+0x58/0x110
kernel:  netlink_unicast+0x1a3/0x290
kernel:  netlink_sendmsg+0x254/0x4d0
kernel:  __sys_sendto+0x1f6/0x200
kernel:  __x64_sys_sendto+0x24/0x30
kernel:  do_syscall_64+0x5d/0x90
kernel:  ? do_syscall_64+0x6c/0x90
kernel:  ? do_syscall_64+0x6c/0x90
kernel:  ? syscall_exit_to_user_mode+0x2b/0x40
kernel:  ? do_syscall_64+0x6c/0x90
kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
kernel: RIP: 0033:0x7f977ae729ec
kernel: RSP: 002b:00007ffeeb6aba50 EFLAGS: 00000246 ORIG_RAX:
000000000000002c
kernel: RAX: ffffffffffffffda RBX: 000056084849e910 RCX:
00007f977ae729ec
kernel: RDX: 0000000000000014 RSI: 00007ffeeb6abad0 RDI:
0000000000000007
kernel: RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
kernel: R10: 0000000000004000 R11: 0000000000000246 R12:
0000000000000014
kernel: R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
kernel:  </TASK>
kernel: INFO: task pool-gnome-shel:1986 blocked for more than 122
seconds.
kernel:       Not tainted 6.6.2-arch1-1 #1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
kernel: task:pool-gnome-shel state:D stack:0     pid:1986  ppid:1513  
flags:0x00000002
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x3e8/0x1410
kernel:  schedule+0x5e/0xd0
kernel:  schedule_preempt_disabled+0x15/0x30
kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
kernel:  __netlink_dump_start+0x75/0x290
kernel:  ? __pfx_rtnl_dump_all+0x10/0x10
kernel:  rtnetlink_rcv_msg+0x277/0x3c0
kernel:  ? __pfx_rtnl_dump_all+0x10/0x10
kernel:  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
kernel:  netlink_rcv_skb+0x58/0x110
kernel:  netlink_unicast+0x1a3/0x290
kernel:  netlink_sendmsg+0x254/0x4d0
kernel:  __sys_sendto+0x1f6/0x200
kernel:  __x64_sys_sendto+0x24/0x30
kernel:  do_syscall_64+0x5d/0x90
kernel:  ? syscall_exit_to_user_mode+0x2b/0x40
kernel:  ? do_syscall_64+0x6c/0x90
kernel:  ? exc_page_fault+0x7f/0x180
kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
kernel: RIP: 0033:0x7f232af30bfc
kernel: RSP: 002b:00007f223e1fbba0 EFLAGS: 00000293 ORIG_RAX:
000000000000002c
kernel: RAX: ffffffffffffffda RBX: 00007f223e1fccc0 RCX:
00007f232af30bfc
kernel: RDX: 0000000000000014 RSI: 00007f223e1fccc0 RDI:
0000000000000028
kernel: RBP: 0000000000000000 R08: 00007f223e1fcc64 R09:
000000000000000c
kernel: R10: 0000000000000000 R11: 0000000000000293 R12:
0000000000000028
kernel: R13: 00007f223e1fcc80 R14: 0000000000000665 R15:
000055638262fd10
kernel:  </TASK>
kernel: INFO: task evolution-sourc:1819 blocked for more than 122
seconds.
kernel:       Not tainted 6.6.2-arch1-1 #1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
kernel: task:evolution-sourc state:D stack:0     pid:1819  ppid:1513  
flags:0x00000006
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x3e8/0x1410
kernel:  schedule+0x5e/0xd0
kernel:  schedule_preempt_disabled+0x15/0x30
kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
kernel:  ? netlink_lookup+0x151/0x1d0
kernel:  __netlink_dump_start+0x75/0x290
kernel:  ? __pfx_rtnl_dump_all+0x10/0x10
kernel:  rtnetlink_rcv_msg+0x277/0x3c0
kernel:  ? __pfx_rtnl_dump_all+0x10/0x10
kernel:  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
kernel:  netlink_rcv_skb+0x58/0x110
kernel:  netlink_unicast+0x1a3/0x290
kernel:  netlink_sendmsg+0x254/0x4d0
kernel:  __sys_sendto+0x1f6/0x200
kernel:  __x64_sys_sendto+0x24/0x30
kernel:  do_syscall_64+0x5d/0x90
kernel:  ? do_syscall_64+0x6c/0x90
kernel:  ? sock_getsockopt+0x22/0x30
kernel:  ? __fget_light+0x99/0x100
kernel:  ? __sys_setsockopt+0x129/0x1d0
kernel:  ? syscall_exit_to_user_mode+0x2b/0x40
kernel:  ? do_syscall_64+0x6c/0x90
kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
kernel: RIP: 0033:0x7f6aa096c9ec
kernel: RSP: 002b:00007fff2b442820 EFLAGS: 00000246 ORIG_RAX:
000000000000002c
kernel: RAX: ffffffffffffffda RBX: 0000561e6b466d80 RCX:
00007f6aa096c9ec
kernel: RDX: 0000000000000014 RSI: 00007fff2b4428a0 RDI:
000000000000000a
kernel: RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
kernel: R10: 0000000000004000 R11: 0000000000000246 R12:
0000000000000014
kernel: R13: 00007fff2b442a70 R14: 0000000000000000 R15:
0000000000000001
kernel:  </TASK>
kernel: INFO: task gnome-software:1904 blocked for more than 122
seconds.
kernel:       Not tainted 6.6.2-arch1-1 #1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
kernel: task:gnome-software  state:D stack:0     pid:1904  ppid:1613  
flags:0x00000002
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x3e8/0x1410
kernel:  ? __pte_offset_map_lock+0x9e/0x110
kernel:  schedule+0x5e/0xd0
kernel:  schedule_preempt_disabled+0x15/0x30
kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
kernel:  ? netlink_lookup+0x151/0x1d0
kernel:  __netlink_dump_start+0x75/0x290
kernel:  ? __pfx_rtnl_dump_all+0x10/0x10
kernel:  rtnetlink_rcv_msg+0x277/0x3c0
kernel:  ? __pfx_rtnl_dump_all+0x10/0x10
kernel:  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
kernel:  netlink_rcv_skb+0x58/0x110
kernel:  netlink_unicast+0x1a3/0x290
kernel:  netlink_sendmsg+0x254/0x4d0
kernel:  __sys_sendto+0x1f6/0x200
kernel:  __x64_sys_sendto+0x24/0x30
kernel:  do_syscall_64+0x5d/0x90
kernel:  ? __fget_light+0x99/0x100
kernel:  ? __sys_setsockopt+0x129/0x1d0
kernel:  ? syscall_exit_to_user_mode+0x2b/0x40
kernel:  ? do_syscall_64+0x6c/0x90
kernel:  ? syscall_exit_to_user_mode+0x2b/0x40
kernel:  ? do_syscall_64+0x6c/0x90
kernel:  ? exc_page_fault+0x7f/0x180
kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
kernel: RIP: 0033:0x7fdbfd26d9ec
kernel: RSP: 002b:00007ffd15dd63e0 EFLAGS: 00000246 ORIG_RAX:
000000000000002c
kernel: RAX: ffffffffffffffda RBX: 000056133c78f580 RCX:
00007fdbfd26d9ec
kernel: RDX: 0000000000000014 RSI: 00007ffd15dd6460 RDI:
000000000000000b
kernel: RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
kernel: R10: 0000000000004000 R11: 0000000000000246 R12:
0000000000000014
kernel: R13: 00007ffd15dd6630 R14: 0000000000000000 R15:
0000000000000001
kernel:  </TASK>
kernel: Future hung task reports are suppressed, see sysctl
kernel.hung_task_warnings

From the call traces, it seems that the issue is caused by commit
621735f590643e3048ca2060c285b80551660601 (r8169: fix rare issue with
broken rx after link-down on RTL8125), which got backported to 6.6.2.

Ian

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)

Powered by blists - more mailing lists