lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250129004337.36898-3-shannon.nelson@amd.com>
Date: Tue, 28 Jan 2025 16:43:37 -0800
From: Shannon Nelson <shannon.nelson@....com>
To: <netdev@...r.kernel.org>, <davem@...emloft.net>, <kuba@...nel.org>,
	<andrew+netdev@...n.ch>, <edumazet@...gle.com>, <pabeni@...hat.com>
CC: <brett.creeley@....com>, Shannon Nelson <shannon.nelson@....com>
Subject: [PATCH net 2/2] pds_core: Add a retry mechanism when the adminq is full

From: Brett Creeley <brett.creeley@....com>

If the adminq is full, the driver reports failure when trying to post
new adminq commands. This is a bit aggressive and unexpected because
technically the adminq post didn't fail in this case, it was just full.
To harden this path add support for a bounded retry mechanism.

It's possible some commands take longer than expected, maybe hundreds
of milliseconds or seconds due to other processing on the device side,
so to further reduce the chance of failure due to adminq full increase
the PDS_CORE_DEVCMD_TIMEOUT from 5 to 10 seconds.

The caller of pdsc_adminq_post() may still see -EAGAIN reported if the
space in the adminq never freed up. In this case they can choose to
call the function again or fail. For now, no callers will retry.

Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands")
Signed-off-by: Brett Creeley <brett.creeley@....com>
Signed-off-by: Shannon Nelson <shannon.nelson@....com>
---
 drivers/net/ethernet/amd/pds_core/adminq.c | 22 ++++++++++++++++++----
 include/linux/pds/pds_core_if.h            |  2 +-
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/amd/pds_core/adminq.c b/drivers/net/ethernet/amd/pds_core/adminq.c
index c83a0a80d533..387de1712827 100644
--- a/drivers/net/ethernet/amd/pds_core/adminq.c
+++ b/drivers/net/ethernet/amd/pds_core/adminq.c
@@ -181,7 +181,10 @@ static int __pdsc_adminq_post(struct pdsc *pdsc,
 	else
 		avail -= q->head_idx + 1;
 	if (!avail) {
-		ret = -ENOSPC;
+		if (!pdsc_is_fw_running(pdsc))
+			ret = -ENXIO;
+		else
+			ret = -EAGAIN;
 		goto err_out_unlock;
 	}
 
@@ -251,14 +254,25 @@ int pdsc_adminq_post(struct pdsc *pdsc,
 	}
 
 	wc.qcq = &pdsc->adminqcq;
-	index = __pdsc_adminq_post(pdsc, &pdsc->adminqcq, cmd, comp, &wc);
+	time_start = jiffies;
+	time_limit = time_start + HZ * pdsc->devcmd_timeout;
+	do {
+		index = __pdsc_adminq_post(pdsc, &pdsc->adminqcq, cmd, comp,
+					   &wc);
+		if (index != -EAGAIN)
+			break;
+
+		dev_dbg(pdsc->dev, "Retrying adminq cmd opcode %u\n",
+			cmd->opcode);
+		/* Give completion processing a chance to free up space */
+		msleep(1);
+	} while (time_before(jiffies, time_limit));
+
 	if (index < 0) {
 		err = index;
 		goto err_out;
 	}
 
-	time_start = jiffies;
-	time_limit = time_start + HZ * pdsc->devcmd_timeout;
 	do {
 		/* Timeslice the actual wait to catch IO errors etc early */
 		poll_jiffies = msecs_to_jiffies(poll_interval);
diff --git a/include/linux/pds/pds_core_if.h b/include/linux/pds/pds_core_if.h
index 17a87c1a55d7..babc6d573acd 100644
--- a/include/linux/pds/pds_core_if.h
+++ b/include/linux/pds/pds_core_if.h
@@ -22,7 +22,7 @@
 #define PDS_CORE_BAR0_INTR_CTRL_OFFSET		0x2000
 #define PDS_CORE_DEV_CMD_DONE			0x00000001
 
-#define PDS_CORE_DEVCMD_TIMEOUT			5
+#define PDS_CORE_DEVCMD_TIMEOUT			10
 
 #define PDS_CORE_CLIENT_ID			0
 #define PDS_CORE_ASIC_TYPE_CAPRI		0
-- 
2.17.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ