[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e22e7963-2077-45b8-97c8-8eef07862ef4@linux.dev>
Date: Wed, 12 Apr 2023 09:15:01 +0800
From: Guoqing Jiang <guoqing.jiang@...ux.dev>
To: "Zhijian Li (Fujitsu)" <lizhijian@...itsu.com>,
"haris.iqbal@...os.com" <haris.iqbal@...os.com>,
"jinpu.wang@...os.com" <jinpu.wang@...os.com>,
"jgg@...pe.ca" <jgg@...pe.ca>, "leon@...nel.org" <leon@...nel.org>,
"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH for-next 3/3] RDMA/rtrs: Avoid use-after-free in
rtrs_clt_rdma_cm_handler
On 4/11/23 09:33, Zhijian Li (Fujitsu) wrote:
>
>>> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
>>> index 4c8f42e46e2f..760a7eb51297 100644
>>> --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
>>> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
>>> @@ -2074,6 +2074,7 @@ static int create_cm(struct rtrs_clt_con *con)
>>> rtrs_err(s, "Failed to resolve address, err: %d\n", err);
>>> goto destroy_cm;
>>> }
>>> +again:
>>> /*
>>> * Combine connection status and session events. This is needed
>>> * for waiting two possible cases: cm_err has something meaningful
>>> @@ -2083,10 +2084,15 @@ static int create_cm(struct rtrs_clt_con *con)
>>> clt_path->state_wq,
>>> con->cm_err || clt_path->state != RTRS_CLT_CONNECTING,
>>> msecs_to_jiffies(RTRS_CONNECT_TIMEOUT_MS));
>>> - if (err == 0 || err == -ERESTARTSYS) {
>>> - if (err == 0)
>>> - err = -ETIMEDOUT;
>>> - /* Timedout or interrupted */
>>> + if (err == -ERESTARTSYS) {
>>> + /* interrupted,
>>> + * try again to avoid the in-flight rtrs_clt_rdma_cm_handler()
>>> + * getting a use-after-free
>>> + */
>>> + goto again;
>>> + } else if (err == 0) {
>>> + err = -ETIMEDOUT;
>>> + /* Timedout */
>>> goto errr;
>>> }
>> Can event handler still be triggered in case of timeout?
> I have never hit such race.
But it is still possible with the theory.
>> And I guess either stop_cm -> rdma_disconnect or destroy_cm -> rdma_destroy_id
>> should prevent this kind of racy issue.
> In practice, they are possible that rtrs_clt_rdma_cm_handler() is in-flight during
> 'either stop_cm -> rdma_disconnect or destroy_cm -> rdma_destroy_id'. rtrs_clt_rdma_cm_handler() and
> cm's cleanup path need to hold mutex_lock(&con->con_mutex), once cm's cleanup path get this lock first
> rtrs_clt_rdma_cm_handler has to sleep, when rtrs_clt_rdma_cm_handler is wakeup again, some resources has been
> freed by cm's cleanup path.
First, stop_cm doesn't need to hold &con->con_mutex. But
rtrs_clt_rdma_cm_handler
is called from rdma core layer which need id_priv->handler_mutexinstead
of con_mutex
I thinnk. Also RTRS has similar behavior as nvme host rdma, ib_srp and
iser_verb.
@Jinpu/Haris, can we move destroy_cm right after stop_cm?*
*Thanks,
Guoqing
Powered by blists - more mailing lists