lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 1 Feb 2017 12:17:05 +0100 From: Hans Westgaard Ry <hans.westgaard.ry@...cle.com> To: Doug Ledford <dledford@...hat.com>, Sean Hefty <sean.hefty@...el.com>, Hal Rosenstock <hal.rosenstock@...il.com>, Matan Barak <matanb@...lanox.com>, Erez Shitrit <erezsh@...lanox.com>, Bart Van Assche <bart.vanassche@...disk.com>, Ira Weiny <ira.weiny@...el.com>, Or Gerlitz <ogerlitz@...lanox.com>, Hakon Bugge <haakon.bugge@...cle.com>, Yuval Shaia <yuval.shaia@...cle.com>, linux-rdma@...r.kernel.org, linux-kernel@...r.kernel.org Subject: [PING][PATCH] IBcore/CM: Issue DREQ when receiving REQ/REP for stale QP On 10/28/2016 01:14 PM, Hans Westgaard Ry wrote: > from "InfiBand Architecture Specifications Volume 1": > > A QP is said to have a stale connection when only one side has > connection information. A stale connection may result if the remote CM > had dropped the connection and sent a DREQ but the DREQ was never > received by the local CM. Alternatively the remote CM may have lost > all record of past connections because its node crashed and rebooted, > while the local CM did not become aware of the remote node's reboot > and therefore did not clean up stale connections. > > and: > > A local CM may receive a REQ/REP for a stale connection. It shall > abort the connection issuing REJ to the REQ/REP. It shall then issue > DREQ with "DREQ:remote QPN” set to the remote QPN from the REQ/REP. > > This patch solves a problem with reuse of QPN. Current codebase, that > is IPoIB, relies on a REAP-mechanism to do cleanup of the structures > in CM. A problem with this is the timeconstants governing this > mechanism; they are up to 768 seconds and the interface may look > inresponsive in that period. Issuing a DREQ (and receiving a DREP) > does the necessary cleanup and the interface comes up. > > Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@...cle.com> > Reviewed-by: Håkon Bugge <haakon.bugge@...cle.com> > --- > drivers/infiniband/core/cm.c | 24 +++++++++++++++++++++++- > 1 file changed, 23 insertions(+), 1 deletion(-) > > diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c > index c995255..c97e4d5 100644 > --- a/drivers/infiniband/core/cm.c > +++ b/drivers/infiniband/core/cm.c > @@ -1519,6 +1519,7 @@ static struct cm_id_private * cm_match_req(struct cm_work *work, > struct cm_id_private *listen_cm_id_priv, *cur_cm_id_priv; > struct cm_timewait_info *timewait_info; > struct cm_req_msg *req_msg; > + struct ib_cm_id *cm_id; > > req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad; > > @@ -1540,10 +1541,18 @@ static struct cm_id_private * cm_match_req(struct cm_work *work, > timewait_info = cm_insert_remote_qpn(cm_id_priv->timewait_info); > if (timewait_info) { > cm_cleanup_timewait(cm_id_priv->timewait_info); > + cur_cm_id_priv = cm_get_id(timewait_info->work.local_id, > + timewait_info->work.remote_id); > + > spin_unlock_irq(&cm.lock); > cm_issue_rej(work->port, work->mad_recv_wc, > IB_CM_REJ_STALE_CONN, CM_MSG_RESPONSE_REQ, > NULL, 0); > + if (cur_cm_id_priv) { > + cm_id = &cur_cm_id_priv->id; > + ib_send_cm_dreq(cm_id, NULL, 0); > + cm_deref_id(cur_cm_id_priv); > + } > return NULL; > } > > @@ -1919,6 +1928,9 @@ static int cm_rep_handler(struct cm_work *work) > struct cm_id_private *cm_id_priv; > struct cm_rep_msg *rep_msg; > int ret; > + struct cm_id_private *cur_cm_id_priv; > + struct ib_cm_id *cm_id; > + struct cm_timewait_info *timewait_info; > > rep_msg = (struct cm_rep_msg *)work->mad_recv_wc->recv_buf.mad; > cm_id_priv = cm_acquire_id(rep_msg->remote_comm_id, 0); > @@ -1953,16 +1965,26 @@ static int cm_rep_handler(struct cm_work *work) > goto error; > } > /* Check for a stale connection. */ > - if (cm_insert_remote_qpn(cm_id_priv->timewait_info)) { > + timewait_info = cm_insert_remote_qpn(cm_id_priv->timewait_info); > + if (timewait_info) { > rb_erase(&cm_id_priv->timewait_info->remote_id_node, > &cm.remote_id_table); > cm_id_priv->timewait_info->inserted_remote_id = 0; > + cur_cm_id_priv = cm_get_id(timewait_info->work.local_id, > + timewait_info->work.remote_id); > + > spin_unlock(&cm.lock); > spin_unlock_irq(&cm_id_priv->lock); > cm_issue_rej(work->port, work->mad_recv_wc, > IB_CM_REJ_STALE_CONN, CM_MSG_RESPONSE_REP, > NULL, 0); > ret = -EINVAL; > + if (cur_cm_id_priv) { > + cm_id = &cur_cm_id_priv->id; > + ib_send_cm_dreq(cm_id, NULL, 0); > + cm_deref_id(cur_cm_id_priv); > + } > + > goto error; > } > spin_unlock(&cm.lock);
Powered by blists - more mailing lists