lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20211115165420.339019519@linuxfoundation.org>
Date:   Mon, 15 Nov 2021 17:51:33 +0100
From:   Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To:     linux-kernel@...r.kernel.org
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        stable@...r.kernel.org, Justin Tee <justin.tee@...adcom.com>,
        James Smart <jsmart2021@...il.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>
Subject: [PATCH 5.14 010/849] scsi: lpfc: Dont release final kref on Fport node while ABTS outstanding

From: James Smart <jsmart2021@...il.com>

commit 982fc3965d1350d3332e04046b0e101006184ba9 upstream.

In a rarely executed path, FLOGI failure, there is a refcounting error.  If
FLOGI completed with an error, typically a timeout, the initial completion
handler would remove the job reference. However, the job completion isn't
the actual end of the job/exchange as the timeout usually initiates an
ABTS, and upon that ABTS completion, a final completion is sent. The driver
removes the reference again in the final completion. Thus the imbalance.

In the buggy cases, if there was a link bounce while the delayed response
is outstanding, the fport node may be referenced again but there was no
additional reference as it is already present. The delayed completion then
occurs and removes the last reference freeing the node and causing issues
in the link up processed that is using the node.

Fix this scenario by removing the snippet that removed the reference in the
initial FLOGI completion. The bad snippet was poorly trying to identify the
FLOGI as OK to do so by realizing the node was not registered with either
SCSI or NVMe transport.

Link: https://lore.kernel.org/r/20210910233159.115896-3-jsmart2021@gmail.com
Fixes: 618e2ee146d4 ("scsi: lpfc: Fix FLOGI failure due to accessing a freed node")
Cc: <stable@...r.kernel.org> # v5.13+
Co-developed-by: Justin Tee <justin.tee@...adcom.com>
Signed-off-by: Justin Tee <justin.tee@...adcom.com>
Signed-off-by: James Smart <jsmart2021@...il.com>
Signed-off-by: Martin K. Petersen <martin.petersen@...cle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
---
 drivers/scsi/lpfc/lpfc_els.c     |   11 +++++------
 drivers/scsi/lpfc/lpfc_hbadisc.c |   10 ++++++----
 drivers/scsi/lpfc/lpfc_nvme.c    |    5 +++--
 3 files changed, 14 insertions(+), 12 deletions(-)

--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -1056,9 +1056,10 @@ stop_rr_fcf_flogi:
 
 		lpfc_printf_vlog(vport, KERN_WARNING, LOG_TRACE_EVENT,
 				 "0150 FLOGI failure Status:x%x/x%x "
-				 "xri x%x TMO:x%x\n",
+				 "xri x%x TMO:x%x refcnt %d\n",
 				 irsp->ulpStatus, irsp->un.ulpWord[4],
-				 cmdiocb->sli4_xritag, irsp->ulpTimeout);
+				 cmdiocb->sli4_xritag, irsp->ulpTimeout,
+				 kref_read(&ndlp->kref));
 
 		/* If this is not a loop open failure, bail out */
 		if (!(irsp->ulpStatus == IOSTAT_LOCAL_REJECT &&
@@ -1119,12 +1120,12 @@ stop_rr_fcf_flogi:
 	/* FLOGI completes successfully */
 	lpfc_printf_vlog(vport, KERN_INFO, LOG_ELS,
 			 "0101 FLOGI completes successfully, I/O tag:x%x, "
-			 "xri x%x Data: x%x x%x x%x x%x x%x x%x x%x\n",
+			 "xri x%x Data: x%x x%x x%x x%x x%x x%x x%x %d\n",
 			 cmdiocb->iotag, cmdiocb->sli4_xritag,
 			 irsp->un.ulpWord[4], sp->cmn.e_d_tov,
 			 sp->cmn.w2.r_a_tov, sp->cmn.edtovResolution,
 			 vport->port_state, vport->fc_flag,
-			 sp->cmn.priority_tagging);
+			 sp->cmn.priority_tagging, kref_read(&ndlp->kref));
 
 	if (sp->cmn.priority_tagging)
 		vport->vmid_flag |= LPFC_VMID_ISSUE_QFPA;
@@ -1202,8 +1203,6 @@ flogifail:
 	phba->fcf.fcf_flag &= ~FCF_DISCOVERY;
 	spin_unlock_irq(&phba->hbalock);
 
-	if (!(ndlp->fc4_xpt_flags & (SCSI_XPT_REGD | NVME_XPT_REGD)))
-		lpfc_nlp_put(ndlp);
 	if (!lpfc_error_lost_link(irsp)) {
 		/* FLOGI failed, so just use loop map to make discovery list */
 		lpfc_disc_list_loopmap(vport);
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -4429,8 +4429,9 @@ lpfc_register_remote_port(struct lpfc_vp
 		fc_remote_port_rolechg(rport, rport_ids.roles);
 
 	lpfc_printf_vlog(ndlp->vport, KERN_INFO, LOG_NODE,
-			 "3183 %s rport x%px DID x%x, role x%x\n",
-			 __func__, rport, rport->port_id, rport->roles);
+			 "3183 %s rport x%px DID x%x, role x%x refcnt %d\n",
+			 __func__, rport, rport->port_id, rport->roles,
+			 kref_read(&ndlp->kref));
 
 	if ((rport->scsi_target_id != -1) &&
 	    (rport->scsi_target_id < LPFC_MAX_TARGET)) {
@@ -4455,8 +4456,9 @@ lpfc_unregister_remote_port(struct lpfc_
 
 	lpfc_printf_vlog(vport, KERN_INFO, LOG_NODE,
 			 "3184 rport unregister x%06x, rport x%px "
-			 "xptflg x%x\n",
-			 ndlp->nlp_DID, rport, ndlp->fc4_xpt_flags);
+			 "xptflg x%x refcnt %d\n",
+			 ndlp->nlp_DID, rport, ndlp->fc4_xpt_flags,
+			 kref_read(&ndlp->kref));
 
 	fc_remote_port_delete(rport);
 	lpfc_nlp_put(ndlp);
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -209,8 +209,9 @@ lpfc_nvme_remoteport_delete(struct nvme_
 	 * calling state machine to remove the node.
 	 */
 	lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME_DISC,
-			"6146 remoteport delete of remoteport x%px\n",
-			remoteport);
+			 "6146 remoteport delete of remoteport x%px, ndlp x%px "
+			 "DID x%x xflags x%x\n",
+			 remoteport, ndlp, ndlp->nlp_DID, ndlp->fc4_xpt_flags);
 	spin_lock_irq(&ndlp->lock);
 
 	/* The register rebind might have occurred before the delete


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ