lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <48307bf.eecb.193f974dadf.Coremail.linma@zju.edu.cn>
Date: Wed, 25 Dec 2024 00:16:45 +0800 (GMT+08:00)
From: "Lin Ma" <linma@....edu.cn>
To: "Leon Romanovsky" <leon@...nel.org>
Cc: jgg@...pe.ca, cmeiohas@...dia.com, michaelgur@...dia.com,
	linux-rdma@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [bug report] RDMA/iwpm: reentrant iwpm hello message

Hello Leon,

> > Please let me know if I understand this correctly or incorrectly?
> 
> The thing is that down_write() is called when we unregistering module
> which sent netlink messages. It shouldn't happen.
> 

I acknowledge that this is a low-probability event. However, the race
condition still exists; otherwise, these read and write semaphores
would not be necessary. Why not just remove all of them?

Moreover, I find that even without the deadlock, this reentrant message
would hang the kernel and cannot be killed, with logs like below:
(after disabling locking sanitizer, tested in latest ubuntu)

[2187983.899998] INFO: task poc.elf:1717021 blocked for more than 122 seconds.
[2187983.900049]       Not tainted 6.8.0-49-generic #49~22.04.1-Ubuntu
[2187983.900057] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2187983.900063] task:poc.elf       state:D stack:0     pid:1717021 tgid:1717021 ppid:1716834 flags:0x00004006
[2187983.900087] Call Trace:
[2187983.900094]  <TASK>
[2187983.900355]  __schedule+0x27c/0x6a0
[2187983.900430]  schedule+0x33/0x110
[2187983.900442]  schedule_preempt_disabled+0x15/0x30
[2187983.900454]  __mutex_lock.constprop.0+0x3f8/0x7a0
[2187983.900476]  __mutex_lock_slowpath+0x13/0x20
[2187983.900486]  mutex_lock+0x3c/0x50
[2187983.900493]  __netlink_dump_start+0x76/0x2a0
[2187983.900552]  rdma_nl_rcv_msg+0x24c/0x310 [ib_core]
[2187983.900673]  ? __pfx_iwpm_hello_cb+0x10/0x10 [iw_cm]
[2187983.900699]  rdma_nl_rcv_skb.constprop.0.isra.0+0xbb/0x120 [ib_core]
[2187983.900802]  rdma_nl_rcv+0xe/0x20 [ib_core]
[2187983.900898]  netlink_unicast+0x1b0/0x2a0
[2187983.900911]  rdma_nl_unicast+0x49/0x70 [ib_core]
[2187983.901005]  iwpm_send_hello+0xfd/0x150 [iw_cm]
[2187983.901030]  iwpm_hello_cb+0xb9/0x130 [iw_cm]
[2187983.901052]  netlink_dump+0x1c0/0x340
[2187983.901065]  __netlink_dump_start+0x1ef/0x2a0
[2187983.901077]  rdma_nl_rcv_msg+0x24c/0x310 [ib_core]
[2187983.901219]  ? __pfx_iwpm_hello_cb+0x10/0x10 [iw_cm]
[2187983.901245]  rdma_nl_rcv_skb.constprop.0.isra.0+0xbb/0x120 [ib_core]
[2187983.901344]  rdma_nl_rcv+0xe/0x20 [ib_core]
[2187983.901437]  netlink_unicast+0x1b0/0x2a0
[2187983.901449]  rdma_nl_unicast+0x49/0x70 [ib_core]
[2187983.901544]  iwpm_send_hello+0xfd/0x150 [iw_cm]
[2187983.901567]  iwpm_hello_cb+0xb9/0x130 [iw_cm]
[2187983.901589]  netlink_dump+0x1c0/0x340
[2187983.901602]  __netlink_dump_start+0x1ef/0x2a0
[2187983.901613]  rdma_nl_rcv_msg+0x24c/0x310 [ib_core]
[2187983.901707]  ? __pfx_iwpm_hello_cb+0x10/0x10 [iw_cm]
[2187983.901731]  rdma_nl_rcv_skb.constprop.0.isra.0+0xbb/0x120 [ib_core]
[2187983.901830]  rdma_nl_rcv+0xe/0x20 [ib_core]
[2187983.901922]  netlink_unicast+0x1b0/0x2a0
[2187983.901933]  netlink_sendmsg+0x214/0x470
[2187983.901946]  __sys_sendto+0x21b/0x230
[2187983.901992]  __x64_sys_sendto+0x24/0x40
[2187983.902002]  x64_sys_call+0x1fc0/0x24b0
[2187983.902023]  do_syscall_64+0x81/0x170
[2187983.902059]  ? security_file_alloc+0x5f/0xf0
[2187983.902079]  ? alloc_empty_file+0x85/0x130
[2187983.902140]  ? alloc_file+0x9b/0x170
[2187983.902150]  ? alloc_file_pseudo+0x9e/0x100
[2187983.902163]  ? restore_fpregs_from_fpstate+0x3d/0xd0
[2187983.902197]  ? switch_fpu_return+0x55/0xf0
[2187983.902208]  ? syscall_exit_to_user_mode+0x83/0x260
[2187983.902229]  ? do_syscall_64+0x8d/0x170
[2187983.902240]  ? irqentry_exit+0x43/0x50
[2187983.902249]  ? clear_bhb_loop+0x15/0x70
[2187983.902293]  ? clear_bhb_loop+0x15/0x70
[2187983.902302]  ? clear_bhb_loop+0x15/0x70
[2187983.902311]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[2187983.902319] RIP: 0033:0x440624
[2187983.902582] RSP: 002b:00007ffcfa4b29f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[2187983.902592] RAX: ffffffffffffffda RBX: 0000000000400400 RCX: 0000000000440624
[2187983.902598] RDX: 0000000000000018 RSI: 00007ffcfa4b2a30 RDI: 0000000000000003
[2187983.902604] RBP: 00007ffcfa4b3a40 R08: 000000000047df08 R09: 000000000000000c
[2187983.902609] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000403990
[2187983.902614] R13: 0000000000000000 R14: 00000000006a6018 R15: 0000000000000000

That's why I'm quite sure this is a bug and requires fixing.

Thanks
Lin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ