[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <00f701d1c658$efd720d0$cf856270$@opengridcomputing.com>
Date: Tue, 14 Jun 2016 11:22:25 -0500
From: "Steve Wise" <swise@...ngridcomputing.com>
To: "'Sagi Grimberg'" <sagi@...htbits.io>,
"'Christoph Hellwig'" <hch@....de>, <axboe@...nel.dk>,
<keith.busch@...el.com>, <sean.hefty@...el.com>
Cc: <linux-nvme@...ts.infradead.org>, <linux-block@...r.kernel.org>,
<linux-kernel@...r.kernel.org>,
"'Armen Baloyan'" <armenx.baloyan@...el.com>,
"'Jay Freyensee'" <james.p.freyensee@...el.com>,
"'Ming Lin'" <ming.l@....samsung.com>, <linux-rdma@...r.kernel.org>
Subject: RE: [PATCH 4/5] nvmet-rdma: add a NVMe over Fabrics RDMA target driver
>
> Hey Sean,
>
> Am I correct here? IE: Is it ok for the rdma application to rdma_reject() and
> rmda_destroy_id() the CONNECT_REQUEST cm_id _inside_ its event handler as
> long
> as it returns 0?
>
> Thanks,
>
> Steve.
Looking at rdma_destroy_id(), I think it is invalid to call it from the event
handler:
void rdma_destroy_id(struct rdma_cm_id *id)
{
<snip>
/*
* Wait for any active callback to finish. New callbacks will find
* the id_priv state set to destroying and abort.
*/
mutex_lock(&id_priv->handler_mutex);
mutex_unlock(&id_priv->handler_mutex);
And indeed when I tried to destroy the CONNECT request cm_id in the nvmet event
handler, I see the event handler thread is stuck:
INFO: task kworker/u32:0:6275 blocked for more than 120 seconds.
Tainted: G E 4.7.0-rc2-nvmf-all.3+ #81
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u32:0 D ffff880f90737768 0 6275 2 0x10000080
Workqueue: iw_cm_wq cm_work_handler [iw_cm]
ffff880f90737768 ffff880f907376d8 ffffffff81c0b500 0000000000000005
ffff8810226a4940 ffff88102b894490 ffffffffa02cf4cd ffff880f00000000
ffff880fcd917c00 ffff880f00000000 0000000000000004 ffff880f00000000
Call Trace:
[<ffffffffa02cf4cd>] ? stop_ep_timer+0x2d/0xe0 [iw_cxgb4]
[<ffffffff8163e6a7>] schedule+0x47/0xc0
[<ffffffffa024d276>] ? iw_cm_reject+0x96/0xe0 [iw_cm]
[<ffffffff8163e8e5>] schedule_preempt_disabled+0x15/0x20
[<ffffffff8163fd78>] __mutex_lock_slowpath+0x108/0x310
[<ffffffff8163ffb1>] mutex_lock+0x31/0x50
[<ffffffffa0261498>] rdma_destroy_id+0x38/0x200 [rdma_cm]
[<ffffffffa03145f0>] ? nvmet_rdma_queue_connect+0x1a0/0x1a0 [nvmet_rdma]
[<ffffffffa0262fe1>] ? rdma_create_id+0x171/0x1a0 [rdma_cm]
[<ffffffffa03146f8>] nvmet_rdma_cm_handler+0x108/0x168 [nvmet_rdma]
[<ffffffffa026407a>] iw_conn_req_handler+0x1ca/0x240 [rdma_cm]
[<ffffffffa024efc6>] cm_conn_req_handler+0x606/0x680 [iw_cm]
[<ffffffffa024f109>] process_event+0xc9/0xf0 [iw_cm]
[<ffffffffa024f277>] cm_work_handler+0x147/0x1c0 [iw_cm]
[<ffffffff8107d4f6>] ? trace_event_raw_event_workqueue_execute_start+0x66/0xa0
[<ffffffff81081736>] process_one_work+0x1c6/0x550
...
So I withdraw my comment about nvmet. I think the code is fine as-is. The 2nd
reject results in a no-op since the connection request was rejected by nvmet.
Steve.
Powered by blists - more mailing lists