[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b2573ccf2340a19b6cb039dac639b2d431c1404c.camel@redhat.com>
Date: Tue, 16 Apr 2024 14:06:42 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Zhengchao Shao <shaozhengchao@...wei.com>, linux-s390@...r.kernel.org,
netdev@...r.kernel.org, davem@...emloft.net, edumazet@...gle.com,
kuba@...nel.org
Cc: wenjia@...ux.ibm.com, jaka@...ux.ibm.com, alibuda@...ux.alibaba.com,
tonylu@...ux.alibaba.com, guwen@...ux.alibaba.com, weiyongjun1@...wei.com,
yuehaibing@...wei.com, tangchengchang@...wei.com
Subject: Re: [PATCH net] net/smc: fix potential sleeping issue in
smc_switch_conns
On Sat, 2024-04-13 at 11:51 +0800, Zhengchao Shao wrote:
> Potential sleeping issue exists in the following processes:
> smc_switch_conns
> spin_lock_bh(&conn->send_lock)
> smc_switch_link_and_count
> smcr_link_put
> __smcr_link_clear
> smc_lgr_put
> __smc_lgr_free
> smc_lgr_free_bufs
> __smc_lgr_free_bufs
> smc_buf_free
> smcr_buf_free
> smcr_buf_unmap_link
> smc_ib_put_memory_region
> ib_dereg_mr
> ib_dereg_mr_user
> mr->device->ops.dereg_mr
> If scheduling exists when the IB driver implements .dereg_mr hook
> function, the bug "scheduling while atomic" will occur. For example,
> cxgb4 and efa driver. Use mutex lock instead of spin lock to fix it.
I tried to inspect all the lock call sites, and it *look* like they are
all in process context, so the switch should be feasible.
Still the fact that the existing lock is a BH variant is suspect.
Either the BH part was not needed or this can introduce subtle
regressions/issues.
I think this deserves at least a 3rd party testing.
Thanks,
Paolo
Powered by blists - more mailing lists