lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 25 Nov 2023 15:58:17 +0100
From: Heiner Kallweit <hkallweit1@...il.com>
To: Ian Chen <free122448@...mail.com>, netdev@...r.kernel.org
Subject: Re: [BUG] r8169: deadlock when NetworkManager brings link up

On 25.11.2023 14:55, Ian Chen wrote:
> Hello,
> 
> My home server runs Arch Linux with its stock kernel on a GIGABYTE Z790
> AORUS ELITE AX with its builtin RTL8125B ethernet adapter.
> 
> After upgrading from 6.6.1.arch1 to 6.6.2.arch1, booting up the system
> would end up in a state where all operations on any netlink socket
> would block forever. The system is effectively unusable. Here's the
> relevant dmesg:
> 
> kernel: INFO: task kworker/u64:2:218 blocked for more than 122 seconds.
> kernel:       Not tainted 6.6.2-arch1-1 #1
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: task:kworker/u64:2   state:D stack:0     pid:218   ppid:2     
> flags:0x00004000
> kernel: Workqueue: events_power_efficient crda_timeout_work [cfg80211]
> kernel: Call Trace:
> kernel:  <TASK>
> kernel:  __schedule+0x3e8/0x1410
> kernel:  schedule+0x5e/0xd0
> kernel:  schedule_preempt_disabled+0x15/0x30
> kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
> kernel:  crda_timeout_work+0x10/0x40 [cfg80211
> d1ff02bd631e7b94dc4a8630ea4cdb5aede1cb9b]
> kernel:  process_one_work+0x171/0x340
> kernel:  worker_thread+0x27b/0x3a0
> kernel:  ? __pfx_worker_thread+0x10/0x10
> kernel:  kthread+0xe5/0x120
> kernel:  ? __pfx_kthread+0x10/0x10
> kernel:  ret_from_fork+0x31/0x50
> kernel:  ? __pfx_kthread+0x10/0x10
> kernel:  ret_from_fork_asm+0x1b/0x30
> kernel:  </TASK>
> kernel: INFO: task kworker/5:1:250 blocked for more than 122 seconds.
> kernel:       Not tainted 6.6.2-arch1-1 #1
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: task:kworker/5:1     state:D stack:0     pid:250   ppid:2     
> flags:0x00004000
> kernel: Workqueue: events linkwatch_event
> kernel: Call Trace:
> kernel:  <TASK>
> kernel:  __schedule+0x3e8/0x1410
> kernel:  ? sched_clock+0x10/0x30
> kernel:  schedule+0x5e/0xd0
> kernel:  schedule_preempt_disabled+0x15/0x30
> kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
> kernel:  linkwatch_event+0x12/0x40
> kernel:  process_one_work+0x171/0x340
> kernel:  worker_thread+0x27b/0x3a0
> kernel:  ? __pfx_worker_thread+0x10/0x10
> kernel:  kthread+0xe5/0x120
> kernel:  ? __pfx_kthread+0x10/0x10
> kernel:  ret_from_fork+0x31/0x50
> kernel:  ? __pfx_kthread+0x10/0x10
> kernel:  ret_from_fork_asm+0x1b/0x30
> kernel:  </TASK>
> kernel: INFO: task kworker/u64:6:290 blocked for more than 122 seconds.
> kernel:       Not tainted 6.6.2-arch1-1 #1
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: task:kworker/u64:6   state:D stack:0     pid:290   ppid:2     
> flags:0x00004000
> kernel: Workqueue: netns cleanup_net
> kernel: Call Trace:
> kernel:  <TASK>
> kernel:  __schedule+0x3e8/0x1410
> kernel:  schedule+0x5e/0xd0
> kernel:  schedule_preempt_disabled+0x15/0x30
> kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
> kernel:  wg_netns_pre_exit+0x19/0x100 [wireguard
> 0c090e6018e49e49957d27fd2202b1db304881dc]
> kernel:  cleanup_net+0x1e0/0x3b0
> kernel:  process_one_work+0x171/0x340
> kernel:  worker_thread+0x27b/0x3a0
> kernel:  ? __pfx_worker_thread+0x10/0x10
> kernel:  kthread+0xe5/0x120
> kernel:  ? __pfx_kthread+0x10/0x10
> kernel:  ret_from_fork+0x31/0x50
> kernel:  ? __pfx_kthread+0x10/0x10
> kernel:  ret_from_fork_asm+0x1b/0x30
> kernel:  </TASK>
> kernel: INFO: task kworker/u64:19:577 blocked for more than 122
> seconds.
> kernel:       Not tainted 6.6.2-arch1-1 #1
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: task:kworker/u64:19  state:D stack:0     pid:577   ppid:2     
> flags:0x00004000
> kernel: Workqueue: events_power_efficient reg_check_chans_work
> [cfg80211]
> kernel: Call Trace:
> kernel:  <TASK>
> kernel:  __schedule+0x3e8/0x1410
> kernel:  ? _get_random_bytes+0xc0/0x1a0
> kernel:  schedule+0x5e/0xd0
> kernel:  schedule_preempt_disabled+0x15/0x30
> kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
> kernel:  ? finish_task_switch.isra.0+0x94/0x2f0
> kernel:  reg_check_chans_work+0x31/0x5b0 [cfg80211
> d1ff02bd631e7b94dc4a8630ea4cdb5aede1cb9b]
> kernel:  process_one_work+0x171/0x340
> kernel:  worker_thread+0x27b/0x3a0
> kernel:  ? __pfx_worker_thread+0x10/0x10
> kernel:  kthread+0xe5/0x120
> kernel:  ? __pfx_kthread+0x10/0x10
> kernel:  ret_from_fork+0x31/0x50
> kernel:  ? __pfx_kthread+0x10/0x10
> kernel:  ret_from_fork_asm+0x1b/0x30
> kernel:  </TASK>
> kernel: INFO: task kworker/u64:23:581 blocked for more than 122
> seconds.
> kernel:       Not tainted 6.6.2-arch1-1 #1
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: task:kworker/u64:23  state:D stack:0     pid:581   ppid:2     
> flags:0x00004000
> kernel: Workqueue: events_power_efficient phy_state_machine [libphy]
> kernel: Call Trace:
> kernel:  <TASK>
> kernel:  __schedule+0x3e8/0x1410
> kernel:  schedule+0x5e/0xd0
> kernel:  schedule_preempt_disabled+0x15/0x30
> kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
> kernel:  phy_state_machine+0x47/0x2c0 [libphy
> 93248cd1d88abf54f1b4cc64a990177f549a7710]
> kernel:  process_one_work+0x171/0x340
> kernel:  worker_thread+0x27b/0x3a0
> kernel:  ? __pfx_worker_thread+0x10/0x10
> kernel:  kthread+0xe5/0x120
> kernel:  ? __pfx_kthread+0x10/0x10
> kernel:  ret_from_fork+0x31/0x50
> kernel:  ? __pfx_kthread+0x10/0x10
> kernel:  ret_from_fork_asm+0x1b/0x30
> kernel:  </TASK>
> kernel: INFO: task NetworkManager:849 blocked for more than 122
> seconds.
> kernel:       Not tainted 6.6.2-arch1-1 #1
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: task:NetworkManager  state:D stack:0     pid:849   ppid:1     
> flags:0x00004002
> kernel: Call Trace:
> kernel:  <TASK>
> kernel:  __schedule+0x3e8/0x1410
> kernel:  ? sysvec_apic_timer_interrupt+0xe/0x90
> kernel:  schedule+0x5e/0xd0
> kernel:  schedule_preempt_disabled+0x15/0x30
> kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
> kernel:  ? pci_conf1_write+0xae/0xf0
> kernel:  ? pcie_set_readrq+0x8e/0x160
> kernel:  phy_start_aneg+0x1d/0x40 [libphy
> 93248cd1d88abf54f1b4cc64a990177f549a7710]
> kernel:  rtl_reset_work+0x1bd/0x3b0 [r8169
> 08653ab60f23923c3943d53f140b2b697e265b93]
> kernel:  r8169_phylink_handler+0x5b/0x240 [r8169
> 08653ab60f23923c3943d53f140b2b697e265b93]
> kernel:  phy_link_change+0x2e/0x60 [libphy
> 93248cd1d88abf54f1b4cc64a990177f549a7710]
> kernel:  phy_check_link_status+0xad/0xe0 [libphy
> 93248cd1d88abf54f1b4cc64a990177f549a7710]
> kernel:  phy_start_aneg+0x25/0x40 [libphy
> 93248cd1d88abf54f1b4cc64a990177f549a7710]
> kernel:  rtl8169_change_mtu+0x24/0x60 [r8169
> 08653ab60f23923c3943d53f140b2b697e265b93]
> kernel:  dev_set_mtu_ext+0xf1/0x200
> kernel:  ? select_task_rq_fair+0x82c/0x1dd0
> kernel:  do_setlink+0x291/0x12d0
> kernel:  ? remove_entity_load_avg+0x31/0x80
> kernel:  ? sched_clock+0x10/0x30
> kernel:  ? sched_clock_cpu+0xf/0x190
> kernel:  ? __smp_call_single_queue+0xad/0x120
> kernel:  ? ttwu_queue_wakelist+0xef/0x110
> kernel:  ? __nla_validate_parse+0x61/0xd10
> kernel:  ? try_to_wake_up+0x2b7/0x640
> kernel:  __rtnl_newlink+0x651/0xa10
> kernel:  ? __kmem_cache_alloc_node+0x1a6/0x340
> kernel:  ? rtnl_newlink+0x2e/0x70
> kernel:  rtnl_newlink+0x47/0x70
> kernel:  rtnetlink_rcv_msg+0x14f/0x3c0
> kernel:  ? number+0x33b/0x3d0
> kernel:  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> kernel:  netlink_rcv_skb+0x58/0x110
> kernel:  netlink_unicast+0x1a3/0x290
> kernel:  netlink_sendmsg+0x254/0x4d0
> kernel:  ____sys_sendmsg+0x396/0x3d0
> kernel:  ? copy_msghdr_from_user+0x7d/0xc0
> kernel:  ___sys_sendmsg+0x9a/0xe0
> kernel:  __sys_sendmsg+0x7a/0xd0
> kernel:  do_syscall_64+0x5d/0x90
> kernel:  ? do_syscall_64+0x6c/0x90
> kernel:  ? do_syscall_64+0x6c/0x90
> kernel:  ? do_syscall_64+0x6c/0x90
> kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> kernel: RIP: 0033:0x7fc9232e7b3d
> kernel: RSP: 002b:00007fffd4df2830 EFLAGS: 00000293 ORIG_RAX:
> 000000000000002e
> kernel: RAX: ffffffffffffffda RBX: 0000000000000055 RCX:
> 00007fc9232e7b3d
> kernel: RDX: 0000000000000000 RSI: 00007fffd4df2870 RDI:
> 000000000000000d
> kernel: RBP: 00007fffd4df2c40 R08: 0000000000000000 R09:
> 0000000000000000
> kernel: R10: 0000000000000000 R11: 0000000000000293 R12:
> 0000563fe71367c0
> kernel: R13: 0000000000000001 R14: 0000000000000000 R15:
> 0000000000000000
> kernel:  </TASK>
> kernel: INFO: task geoclue:1358 blocked for more than 122 seconds.
> kernel:       Not tainted 6.6.2-arch1-1 #1
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: task:geoclue         state:D stack:0     pid:1358  ppid:1     
> flags:0x00000002
> kernel: Call Trace:
> kernel:  <TASK>
> kernel:  __schedule+0x3e8/0x1410
> kernel:  schedule+0x5e/0xd0
> kernel:  schedule_preempt_disabled+0x15/0x30
> kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
> kernel:  __netlink_dump_start+0x75/0x290
> kernel:  ? __pfx_rtnl_dump_all+0x10/0x10
> kernel:  rtnetlink_rcv_msg+0x277/0x3c0
> kernel:  ? __pfx_rtnl_dump_all+0x10/0x10
> kernel:  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> kernel:  netlink_rcv_skb+0x58/0x110
> kernel:  netlink_unicast+0x1a3/0x290
> kernel:  netlink_sendmsg+0x254/0x4d0
> kernel:  __sys_sendto+0x1f6/0x200
> kernel:  __x64_sys_sendto+0x24/0x30
> kernel:  do_syscall_64+0x5d/0x90
> kernel:  ? do_syscall_64+0x6c/0x90
> kernel:  ? do_syscall_64+0x6c/0x90
> kernel:  ? syscall_exit_to_user_mode+0x2b/0x40
> kernel:  ? do_syscall_64+0x6c/0x90
> kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> kernel: RIP: 0033:0x7f977ae729ec
> kernel: RSP: 002b:00007ffeeb6aba50 EFLAGS: 00000246 ORIG_RAX:
> 000000000000002c
> kernel: RAX: ffffffffffffffda RBX: 000056084849e910 RCX:
> 00007f977ae729ec
> kernel: RDX: 0000000000000014 RSI: 00007ffeeb6abad0 RDI:
> 0000000000000007
> kernel: RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000000
> kernel: R10: 0000000000004000 R11: 0000000000000246 R12:
> 0000000000000014
> kernel: R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> kernel:  </TASK>
> kernel: INFO: task pool-gnome-shel:1986 blocked for more than 122
> seconds.
> kernel:       Not tainted 6.6.2-arch1-1 #1
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: task:pool-gnome-shel state:D stack:0     pid:1986  ppid:1513  
> flags:0x00000002
> kernel: Call Trace:
> kernel:  <TASK>
> kernel:  __schedule+0x3e8/0x1410
> kernel:  schedule+0x5e/0xd0
> kernel:  schedule_preempt_disabled+0x15/0x30
> kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
> kernel:  __netlink_dump_start+0x75/0x290
> kernel:  ? __pfx_rtnl_dump_all+0x10/0x10
> kernel:  rtnetlink_rcv_msg+0x277/0x3c0
> kernel:  ? __pfx_rtnl_dump_all+0x10/0x10
> kernel:  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> kernel:  netlink_rcv_skb+0x58/0x110
> kernel:  netlink_unicast+0x1a3/0x290
> kernel:  netlink_sendmsg+0x254/0x4d0
> kernel:  __sys_sendto+0x1f6/0x200
> kernel:  __x64_sys_sendto+0x24/0x30
> kernel:  do_syscall_64+0x5d/0x90
> kernel:  ? syscall_exit_to_user_mode+0x2b/0x40
> kernel:  ? do_syscall_64+0x6c/0x90
> kernel:  ? exc_page_fault+0x7f/0x180
> kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> kernel: RIP: 0033:0x7f232af30bfc
> kernel: RSP: 002b:00007f223e1fbba0 EFLAGS: 00000293 ORIG_RAX:
> 000000000000002c
> kernel: RAX: ffffffffffffffda RBX: 00007f223e1fccc0 RCX:
> 00007f232af30bfc
> kernel: RDX: 0000000000000014 RSI: 00007f223e1fccc0 RDI:
> 0000000000000028
> kernel: RBP: 0000000000000000 R08: 00007f223e1fcc64 R09:
> 000000000000000c
> kernel: R10: 0000000000000000 R11: 0000000000000293 R12:
> 0000000000000028
> kernel: R13: 00007f223e1fcc80 R14: 0000000000000665 R15:
> 000055638262fd10
> kernel:  </TASK>
> kernel: INFO: task evolution-sourc:1819 blocked for more than 122
> seconds.
> kernel:       Not tainted 6.6.2-arch1-1 #1
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: task:evolution-sourc state:D stack:0     pid:1819  ppid:1513  
> flags:0x00000006
> kernel: Call Trace:
> kernel:  <TASK>
> kernel:  __schedule+0x3e8/0x1410
> kernel:  schedule+0x5e/0xd0
> kernel:  schedule_preempt_disabled+0x15/0x30
> kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
> kernel:  ? netlink_lookup+0x151/0x1d0
> kernel:  __netlink_dump_start+0x75/0x290
> kernel:  ? __pfx_rtnl_dump_all+0x10/0x10
> kernel:  rtnetlink_rcv_msg+0x277/0x3c0
> kernel:  ? __pfx_rtnl_dump_all+0x10/0x10
> kernel:  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> kernel:  netlink_rcv_skb+0x58/0x110
> kernel:  netlink_unicast+0x1a3/0x290
> kernel:  netlink_sendmsg+0x254/0x4d0
> kernel:  __sys_sendto+0x1f6/0x200
> kernel:  __x64_sys_sendto+0x24/0x30
> kernel:  do_syscall_64+0x5d/0x90
> kernel:  ? do_syscall_64+0x6c/0x90
> kernel:  ? sock_getsockopt+0x22/0x30
> kernel:  ? __fget_light+0x99/0x100
> kernel:  ? __sys_setsockopt+0x129/0x1d0
> kernel:  ? syscall_exit_to_user_mode+0x2b/0x40
> kernel:  ? do_syscall_64+0x6c/0x90
> kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> kernel: RIP: 0033:0x7f6aa096c9ec
> kernel: RSP: 002b:00007fff2b442820 EFLAGS: 00000246 ORIG_RAX:
> 000000000000002c
> kernel: RAX: ffffffffffffffda RBX: 0000561e6b466d80 RCX:
> 00007f6aa096c9ec
> kernel: RDX: 0000000000000014 RSI: 00007fff2b4428a0 RDI:
> 000000000000000a
> kernel: RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000000
> kernel: R10: 0000000000004000 R11: 0000000000000246 R12:
> 0000000000000014
> kernel: R13: 00007fff2b442a70 R14: 0000000000000000 R15:
> 0000000000000001
> kernel:  </TASK>
> kernel: INFO: task gnome-software:1904 blocked for more than 122
> seconds.
> kernel:       Not tainted 6.6.2-arch1-1 #1
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: task:gnome-software  state:D stack:0     pid:1904  ppid:1613  
> flags:0x00000002
> kernel: Call Trace:
> kernel:  <TASK>
> kernel:  __schedule+0x3e8/0x1410
> kernel:  ? __pte_offset_map_lock+0x9e/0x110
> kernel:  schedule+0x5e/0xd0
> kernel:  schedule_preempt_disabled+0x15/0x30
> kernel:  __mutex_lock.constprop.0+0x39a/0x6a0
> kernel:  ? netlink_lookup+0x151/0x1d0
> kernel:  __netlink_dump_start+0x75/0x290
> kernel:  ? __pfx_rtnl_dump_all+0x10/0x10
> kernel:  rtnetlink_rcv_msg+0x277/0x3c0
> kernel:  ? __pfx_rtnl_dump_all+0x10/0x10
> kernel:  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> kernel:  netlink_rcv_skb+0x58/0x110
> kernel:  netlink_unicast+0x1a3/0x290
> kernel:  netlink_sendmsg+0x254/0x4d0
> kernel:  __sys_sendto+0x1f6/0x200
> kernel:  __x64_sys_sendto+0x24/0x30
> kernel:  do_syscall_64+0x5d/0x90
> kernel:  ? __fget_light+0x99/0x100
> kernel:  ? __sys_setsockopt+0x129/0x1d0
> kernel:  ? syscall_exit_to_user_mode+0x2b/0x40
> kernel:  ? do_syscall_64+0x6c/0x90
> kernel:  ? syscall_exit_to_user_mode+0x2b/0x40
> kernel:  ? do_syscall_64+0x6c/0x90
> kernel:  ? exc_page_fault+0x7f/0x180
> kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> kernel: RIP: 0033:0x7fdbfd26d9ec
> kernel: RSP: 002b:00007ffd15dd63e0 EFLAGS: 00000246 ORIG_RAX:
> 000000000000002c
> kernel: RAX: ffffffffffffffda RBX: 000056133c78f580 RCX:
> 00007fdbfd26d9ec
> kernel: RDX: 0000000000000014 RSI: 00007ffd15dd6460 RDI:
> 000000000000000b
> kernel: RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000000
> kernel: R10: 0000000000004000 R11: 0000000000000246 R12:
> 0000000000000014
> kernel: R13: 00007ffd15dd6630 R14: 0000000000000000 R15:
> 0000000000000001
> kernel:  </TASK>
> kernel: Future hung task reports are suppressed, see sysctl
> kernel.hung_task_warnings
> 
> From the call traces, it seems that the issue is caused by commit
> 621735f590643e3048ca2060c285b80551660601 (r8169: fix rare issue with
> broken rx after link-down on RTL8125), which got backported to 6.6.2.
> 
Thanks for the report. Issue seems to be caused by a recursive call
to phy_start_aneg(), and it seems to be specific to using a jumbo mtu.

Are you using a jumbo mtu? And could you please check whether issue
is gone with the standard mtu?

> Ian

Heiner

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ