linux-kernel - [PING][PATCH] IBcore/CM: Issue DREQ when receiving REQ/REP for stale QP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a24f2cb5-1a4d-8d23-d729-70a3014d20d7@oracle.com>
Date:   Wed, 1 Feb 2017 12:17:05 +0100
From:   Hans Westgaard Ry <hans.westgaard.ry@...cle.com>
To:     Doug Ledford <dledford@...hat.com>,
        Sean Hefty <sean.hefty@...el.com>,
        Hal Rosenstock <hal.rosenstock@...il.com>,
        Matan Barak <matanb@...lanox.com>,
        Erez Shitrit <erezsh@...lanox.com>,
        Bart Van Assche <bart.vanassche@...disk.com>,
        Ira Weiny <ira.weiny@...el.com>,
        Or Gerlitz <ogerlitz@...lanox.com>,
        Hakon Bugge <haakon.bugge@...cle.com>,
        Yuval Shaia <yuval.shaia@...cle.com>,
        linux-rdma@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: [PING][PATCH] IBcore/CM: Issue DREQ when receiving REQ/REP for stale
 QP



On 10/28/2016 01:14 PM, Hans Westgaard Ry wrote:
> from "InfiBand Architecture Specifications Volume 1":
>
>    A QP is said to have a stale connection when only one side has
>    connection information. A stale connection may result if the remote CM
>    had dropped the connection and sent a DREQ but the DREQ was never
>    received by the local CM. Alternatively the remote CM may have lost
>    all record of past connections because its node crashed and rebooted,
>    while the local CM did not become aware of the remote node's reboot
>    and therefore did not clean up stale connections.
>
> and:
>
>     A local CM may receive a REQ/REP for a stale connection. It shall
>     abort the connection issuing REJ to the REQ/REP. It shall then issue
>     DREQ with "DREQ:remote QPN” set to the remote QPN from the REQ/REP.
>
> This patch solves a problem with reuse of QPN. Current codebase, that
> is IPoIB, relies on a REAP-mechanism to do cleanup of the structures
> in CM. A problem with this is the timeconstants governing this
> mechanism; they are up to 768 seconds and the interface may look
> inresponsive in that period.  Issuing a DREQ (and receiving a DREP)
> does the necessary cleanup and the interface comes up.
>
> Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@...cle.com>
> Reviewed-by: Håkon Bugge <haakon.bugge@...cle.com>
> ---
>   drivers/infiniband/core/cm.c | 24 +++++++++++++++++++++++-
>   1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
> index c995255..c97e4d5 100644
> --- a/drivers/infiniband/core/cm.c
> +++ b/drivers/infiniband/core/cm.c
> @@ -1519,6 +1519,7 @@ static struct cm_id_private * cm_match_req(struct cm_work *work,
>   	struct cm_id_private *listen_cm_id_priv, *cur_cm_id_priv;
>   	struct cm_timewait_info *timewait_info;
>   	struct cm_req_msg *req_msg;
> +	struct ib_cm_id *cm_id;
>   
>   	req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad;
>   
> @@ -1540,10 +1541,18 @@ static struct cm_id_private * cm_match_req(struct cm_work *work,
>   	timewait_info = cm_insert_remote_qpn(cm_id_priv->timewait_info);
>   	if (timewait_info) {
>   		cm_cleanup_timewait(cm_id_priv->timewait_info);
> +		cur_cm_id_priv = cm_get_id(timewait_info->work.local_id,
> +					   timewait_info->work.remote_id);
> +
>   		spin_unlock_irq(&cm.lock);
>   		cm_issue_rej(work->port, work->mad_recv_wc,
>   			     IB_CM_REJ_STALE_CONN, CM_MSG_RESPONSE_REQ,
>   			     NULL, 0);
> +		if (cur_cm_id_priv) {
> +			cm_id = &cur_cm_id_priv->id;
> +			ib_send_cm_dreq(cm_id, NULL, 0);
> +			cm_deref_id(cur_cm_id_priv);
> +		}
>   		return NULL;
>   	}
>   
> @@ -1919,6 +1928,9 @@ static int cm_rep_handler(struct cm_work *work)
>   	struct cm_id_private *cm_id_priv;
>   	struct cm_rep_msg *rep_msg;
>   	int ret;
> +	struct cm_id_private *cur_cm_id_priv;
> +	struct ib_cm_id *cm_id;
> +	struct cm_timewait_info *timewait_info;
>   
>   	rep_msg = (struct cm_rep_msg *)work->mad_recv_wc->recv_buf.mad;
>   	cm_id_priv = cm_acquire_id(rep_msg->remote_comm_id, 0);
> @@ -1953,16 +1965,26 @@ static int cm_rep_handler(struct cm_work *work)
>   		goto error;
>   	}
>   	/* Check for a stale connection. */
> -	if (cm_insert_remote_qpn(cm_id_priv->timewait_info)) {
> +	timewait_info = cm_insert_remote_qpn(cm_id_priv->timewait_info);
> +	if (timewait_info) {
>   		rb_erase(&cm_id_priv->timewait_info->remote_id_node,
>   			 &cm.remote_id_table);
>   		cm_id_priv->timewait_info->inserted_remote_id = 0;
> +		cur_cm_id_priv = cm_get_id(timewait_info->work.local_id,
> +					   timewait_info->work.remote_id);
> +
>   		spin_unlock(&cm.lock);
>   		spin_unlock_irq(&cm_id_priv->lock);
>   		cm_issue_rej(work->port, work->mad_recv_wc,
>   			     IB_CM_REJ_STALE_CONN, CM_MSG_RESPONSE_REP,
>   			     NULL, 0);
>   		ret = -EINVAL;
> +		if (cur_cm_id_priv) {
> +			cm_id = &cur_cm_id_priv->id;
> +			ib_send_cm_dreq(cm_id, NULL, 0);
> +			cm_deref_id(cur_cm_id_priv);
> +		}
> +
>   		goto error;
>   	}
>   	spin_unlock(&cm.lock);