lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sun, 6 Feb 2022 23:09:50 +0800 From: Jia-Ju Bai <baijiaju1990@...il.com> To: Karsten Graul <kgraul@...ux.ibm.com> Cc: linux-s390@...r.kernel.org, netdev@...r.kernel.org, linux-kernel <linux-kernel@...r.kernel.org>, davem@...emloft.net, kuba@...nel.org Subject: Re: [BUG] net: smc: possible deadlock in smc_lgr_free() and smc_link_down_work() On 2022/2/2 1:06, Karsten Graul wrote: > On 01/02/2022 08:51, Jia-Ju Bai wrote: >> Hello, >> >> My static analysis tool reports a possible deadlock in the smc module in Linux 5.16: >> >> smc_lgr_free() >> mutex_lock(&lgr->llc_conf_mutex); --> Line 1289 (Lock A) >> smcr_link_clear() >> smc_wr_free_link() >> wait_event(lnk->wr_tx_wait, ...); --> Line 648 (Wait X) >> >> smc_link_down_work() >> mutex_lock(&lgr->llc_conf_mutex); --> Line 1683 (Lock A) >> smcr_link_down() >> smcr_link_clear() >> smc_wr_free_link() >> smc_wr_wakeup_tx_wait() >> wake_up_all(&lnk->wr_tx_wait); --> Line 78 (Wake X) >> >> When smc_lgr_free() is executed, "Wait X" is performed by holding "Lock A". If smc_link_down_work() is executed at this time, "Wake X" cannot be performed to wake up "Wait X" in smc_lgr_free(), because "Lock A" has been already hold by smc_lgr_free(), causing a possible deadlock. >> >> I am not quite sure whether this possible problem is real and how to fix it if it is real. >> Any feedback would be appreciated, thanks :) Hi Karsten, Thanks for the reply and explanation :) > A deeper analysis showed up that this reported possible deadlock is actually not a problem. > > The wait on line 648 in smc_wr.c > wait_event(lnk->wr_tx_wait, (!atomic_read(&lnk->wr_tx_refcnt))); > waits as long as the refcount wr_tx_refcnt is not zero. > > Every time when a caller stops using a link wr_tx_refcnt is decreased, and when it reaches > zero the wr_tx_wait is woken up in smc_wr_tx_link_put() in smc_wr.h, line 70: > if (atomic_dec_and_test(&link->wr_tx_refcnt)) > wake_up_all(&link->wr_tx_wait); Okay, you mean that wake_up_all(&link->wr_tx_wait) in smc_wr_tx_link_put() is used to wake up wait_event() in smc_wr_free_link(). But I wonder whether wake_up_all(&lnk->wr_tx_wait) in smc_wr_wakeup_tx_wait() can wake up this wait_event()? If so, my report is in this case. > Multiple callers of smc_wr_tx_link_put() do not run under the llc_conf_mutex lock, and those > who run under this mutex are saved against the wait_event() in smc_wr_free_link(). In fact, my tool also reports some other possible deadlocks invovling smc_wr_tx_link_put(), which can be called by holding llc_conf_mutex. There are three examples: #BUG 1 smc_lgr_free() mutex_lock(&lgr->llc_conf_mutex); --> Line 1289 (Lock A) smcr_link_clear() smc_wr_free_link() wait_event(lnk->wr_tx_wait, ...); --> Line 648 (Wait X) smcr_buf_unuse() mutex_lock(&lgr->llc_conf_mutex); --> Line 1087 (Lock A) smc_llc_do_delete_rkey() smc_llc_send_delete_rkey() smc_wr_tx_link_put() wake_up_all(&link->wr_tx_wait); --> Line 73 (Wake X) #BUG 2 smc_lgr_free() mutex_lock(&lgr->llc_conf_mutex); --> Line 1289 (Lock A) smcr_link_clear() smc_wr_free_link() wait_event(lnk->wr_tx_wait, ...); --> Line 648 (Wait X) smc_link_down_work() mutex_lock(&lgr->llc_conf_mutex); --> Line 1683 (Lock A) smcr_link_down() smc_llc_send_delete_link() smc_wr_tx_link_put() wake_up_all(&link->wr_tx_wait); --> Line 73 (Wake X) #BUG 3 smc_llc_process_cli_delete_link() mutex_lock(&lgr->llc_conf_mutex); --> Line 1578 (Lock A) smc_llc_send_message() smc_llc_add_pending_send() smc_wr_tx_get_free_slot() wait_event_interruptible_timeout(link->wr_tx_wait, ...); --> Line 219 (Wake X) smc_llc_process_cli_add_link() mutex_lock(&lgr->llc_conf_mutex); --> Line 1198 (Lock A) smc_llc_cli_add_link_invite() smc_llc_send_add_link() smc_wr_tx_link_put() wake_up_all(&link->wr_tx_wait); --> Line 73 (Wake X) I am not quite sure whether these possible problems are real. Any feedback would be appreciated, thanks :) > > Thank you for reporting this finding! Which tool did you use for this analysis? Thanks for your interest :) I have implemented a static analysis tool based on LLVM, to detect deadlocks caused by locking cycles and improper waiting/waking operations. However, this tool still reports some false positives, and thus I am still improving the accuracy of this tool. Suggestions on deadlock detection (especially new/infrequent patterns causing deadlocks) or the tool are welcome ;) Best wishes, Jia-Ju Bai
Powered by blists - more mailing lists