lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241224192616.GI171473@unreal>
Date: Tue, 24 Dec 2024 21:26:16 +0200
From: Leon Romanovsky <leon@...nel.org>
To: Lin Ma <linma@....edu.cn>
Cc: jgg@...pe.ca, cmeiohas@...dia.com, michaelgur@...dia.com,
	linux-rdma@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [bug report] RDMA/iwpm: reentrant iwpm hello message

On Wed, Dec 25, 2024 at 12:16:45AM +0800, Lin Ma wrote:
> Hello Leon,
> 
> > > Please let me know if I understand this correctly or incorrectly?
> > 
> > The thing is that down_write() is called when we unregistering module
> > which sent netlink messages. It shouldn't happen.
> > 
> 
> I acknowledge that this is a low-probability event. However, the race
> condition still exists; otherwise, these read and write semaphores
> would not be necessary. Why not just remove all of them?

netlink input and module removal are different paths and they can be in
parallel, and from this race, the semaphore is protecting.

Do you have reproducer for that?

> 
> Moreover, I find that even without the deadlock, this reentrant message
> would hang the kernel and cannot be killed, with logs like below:
> (after disabling locking sanitizer, tested in latest ubuntu)
> 
> [2187983.899998] INFO: task poc.elf:1717021 blocked for more than 122 seconds.
> [2187983.900049]       Not tainted 6.8.0-49-generic #49~22.04.1-Ubuntu
> [2187983.900057] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [2187983.900063] task:poc.elf       state:D stack:0     pid:1717021 tgid:1717021 ppid:1716834 flags:0x00004006
> [2187983.900087] Call Trace:
> [2187983.900094]  <TASK>
> [2187983.900355]  __schedule+0x27c/0x6a0
> [2187983.900430]  schedule+0x33/0x110
> [2187983.900442]  schedule_preempt_disabled+0x15/0x30
> [2187983.900454]  __mutex_lock.constprop.0+0x3f8/0x7a0
> [2187983.900476]  __mutex_lock_slowpath+0x13/0x20
> [2187983.900486]  mutex_lock+0x3c/0x50
> [2187983.900493]  __netlink_dump_start+0x76/0x2a0
> [2187983.900552]  rdma_nl_rcv_msg+0x24c/0x310 [ib_core]
> [2187983.900673]  ? __pfx_iwpm_hello_cb+0x10/0x10 [iw_cm]
> [2187983.900699]  rdma_nl_rcv_skb.constprop.0.isra.0+0xbb/0x120 [ib_core]
> [2187983.900802]  rdma_nl_rcv+0xe/0x20 [ib_core]
> [2187983.900898]  netlink_unicast+0x1b0/0x2a0
> [2187983.900911]  rdma_nl_unicast+0x49/0x70 [ib_core]
> [2187983.901005]  iwpm_send_hello+0xfd/0x150 [iw_cm]
> [2187983.901030]  iwpm_hello_cb+0xb9/0x130 [iw_cm]
> [2187983.901052]  netlink_dump+0x1c0/0x340
> [2187983.901065]  __netlink_dump_start+0x1ef/0x2a0
> [2187983.901077]  rdma_nl_rcv_msg+0x24c/0x310 [ib_core]
> [2187983.901219]  ? __pfx_iwpm_hello_cb+0x10/0x10 [iw_cm]
> [2187983.901245]  rdma_nl_rcv_skb.constprop.0.isra.0+0xbb/0x120 [ib_core]
> [2187983.901344]  rdma_nl_rcv+0xe/0x20 [ib_core]
> [2187983.901437]  netlink_unicast+0x1b0/0x2a0
> [2187983.901449]  rdma_nl_unicast+0x49/0x70 [ib_core]
> [2187983.901544]  iwpm_send_hello+0xfd/0x150 [iw_cm]
> [2187983.901567]  iwpm_hello_cb+0xb9/0x130 [iw_cm]
> [2187983.901589]  netlink_dump+0x1c0/0x340
> [2187983.901602]  __netlink_dump_start+0x1ef/0x2a0
> [2187983.901613]  rdma_nl_rcv_msg+0x24c/0x310 [ib_core]
> [2187983.901707]  ? __pfx_iwpm_hello_cb+0x10/0x10 [iw_cm]
> [2187983.901731]  rdma_nl_rcv_skb.constprop.0.isra.0+0xbb/0x120 [ib_core]
> [2187983.901830]  rdma_nl_rcv+0xe/0x20 [ib_core]
> [2187983.901922]  netlink_unicast+0x1b0/0x2a0
> [2187983.901933]  netlink_sendmsg+0x214/0x470
> [2187983.901946]  __sys_sendto+0x21b/0x230
> [2187983.901992]  __x64_sys_sendto+0x24/0x40
> [2187983.902002]  x64_sys_call+0x1fc0/0x24b0
> [2187983.902023]  do_syscall_64+0x81/0x170
> [2187983.902059]  ? security_file_alloc+0x5f/0xf0
> [2187983.902079]  ? alloc_empty_file+0x85/0x130
> [2187983.902140]  ? alloc_file+0x9b/0x170
> [2187983.902150]  ? alloc_file_pseudo+0x9e/0x100
> [2187983.902163]  ? restore_fpregs_from_fpstate+0x3d/0xd0
> [2187983.902197]  ? switch_fpu_return+0x55/0xf0
> [2187983.902208]  ? syscall_exit_to_user_mode+0x83/0x260
> [2187983.902229]  ? do_syscall_64+0x8d/0x170
> [2187983.902240]  ? irqentry_exit+0x43/0x50
> [2187983.902249]  ? clear_bhb_loop+0x15/0x70
> [2187983.902293]  ? clear_bhb_loop+0x15/0x70
> [2187983.902302]  ? clear_bhb_loop+0x15/0x70
> [2187983.902311]  entry_SYSCALL_64_after_hwframe+0x78/0x80
> [2187983.902319] RIP: 0033:0x440624
> [2187983.902582] RSP: 002b:00007ffcfa4b29f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> [2187983.902592] RAX: ffffffffffffffda RBX: 0000000000400400 RCX: 0000000000440624
> [2187983.902598] RDX: 0000000000000018 RSI: 00007ffcfa4b2a30 RDI: 0000000000000003
> [2187983.902604] RBP: 00007ffcfa4b3a40 R08: 000000000047df08 R09: 000000000000000c
> [2187983.902609] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000403990
> [2187983.902614] R13: 0000000000000000 R14: 00000000006a6018 R15: 0000000000000000
> 
> That's why I'm quite sure this is a bug and requires fixing.
> 
> Thanks
> Lin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ