netdev - [PATCH net v2 1/3] ice: avoid IRQ collision to fix init failure on ACPI S3 resume

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240613154514.1948785-2-anthony.l.nguyen@intel.com>
Date: Thu, 13 Jun 2024 08:45:08 -0700
From: Tony Nguyen <anthony.l.nguyen@...el.com>
To: davem@...emloft.net,
	kuba@...nel.org,
	pabeni@...hat.com,
	edumazet@...gle.com,
	netdev@...r.kernel.org
Cc: En-Wei Wu <en-wei.wu@...onical.com>,
	anthony.l.nguyen@...el.com,
	david.m.ertman@...el.com,
	shiraz.saleem@...el.com,
	Cyrus Lien <cyrus.lien@...onical.com>,
	Wojciech Drewek <wojciech.drewek@...el.com>,
	Pucha Himasekhar Reddy <himasekharx.reddy.pucha@...el.com>
Subject: [PATCH net v2 1/3] ice: avoid IRQ collision to fix init failure on ACPI S3 resume

From: En-Wei Wu <en-wei.wu@...onical.com>

A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes
that irdma would break and report hardware initialization failed after
suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5).

The problem is caused due to the collision between the irq numbers
requested in irdma and the irq numbers requested in other drivers
after suspend/resume.

The irq numbers used by irdma are derived from ice's ice_pf->msix_entries
which stores mappings between MSI-X index and Linux interrupt number.
It's supposed to be cleaned up when suspend and rebuilt in resume but
it's not, causing irdma using the old irq numbers stored in the old
ice_pf->msix_entries to request_irq() when resume. And eventually
collide with other drivers.

This patch fixes this problem. On suspend, we call ice_deinit_rdma() to
clean up the ice_pf->msix_entries (and free the MSI-X vectors used by
irdma if we've dynamically allocated them). On resume, we call
ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the
MSI-X vectors if we would like to dynamically allocate them).

Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA")
Tested-by: Cyrus Lien <cyrus.lien@...onical.com>
Signed-off-by: En-Wei Wu <en-wei.wu@...onical.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@...el.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@...el.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@...el.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 1b61ca3a6eb6..45d850514f4c 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -5564,7 +5564,7 @@ static int ice_suspend(struct device *dev)
 	 */
 	disabled = ice_service_task_stop(pf);

-	ice_unplug_aux_dev(pf);
+	ice_deinit_rdma(pf);

 	/* Already suspended?, then there is nothing to do */
 	if (test_and_set_bit(ICE_SUSPENDED, pf->state)) {
@@ -5644,6 +5644,11 @@ static int ice_resume(struct device *dev)
 	if (ret)
 		dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret);

+	ret = ice_init_rdma(pf);
+	if (ret)
+		dev_err(dev, "Reinitialize RDMA during resume failed: %d\n",
+			ret);
+
 	clear_bit(ICE_DOWN, pf->state);
 	/* Now perform PF reset and rebuild */
 	reset_type = ICE_RESET_PFR;
-- 
2.41.0