lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu,  8 Dec 2022 17:13:28 +0200
From:   Oded Gabbay <ogabbay@...nel.org>
To:     linux-kernel@...r.kernel.org
Cc:     Tomer Tayar <ttayar@...ana.ai>
Subject: [PATCH 04/26] habanalabs: abort waiting user threads upon error

From: Tomer Tayar <ttayar@...ana.ai>

User should close the FD when being notified about an error, after
which a device reset takes place.

However, if the user has pending threads that wait for completions,
the device release won't be called and eventually the watchdog timeout
will expire, leading to hard reset and killing the user process.

To avoid it, abort such waiting threads right after the error
notification, and block following waiting operations.

Signed-off-by: Tomer Tayar <ttayar@...ana.ai>
Reviewed-by: Oded Gabbay <ogabbay@...nel.org>
Signed-off-by: Oded Gabbay <ogabbay@...nel.org>
---
 .../habanalabs/common/command_submission.c    | 28 +++++++++++++++++--
 drivers/misc/habanalabs/common/device.c       |  2 ++
 drivers/misc/habanalabs/common/habanalabs.h   |  1 +
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c
index ea0e5101c10e..cf3b82efc65c 100644
--- a/drivers/misc/habanalabs/common/command_submission.c
+++ b/drivers/misc/habanalabs/common/command_submission.c
@@ -1117,6 +1117,27 @@ void hl_release_pending_user_interrupts(struct hl_device *hdev)
 	wake_pending_user_interrupt_threads(interrupt);
 }
 
+static void force_complete_cs(struct hl_device *hdev)
+{
+	struct hl_cs *cs;
+
+	spin_lock(&hdev->cs_mirror_lock);
+
+	list_for_each_entry(cs, &hdev->cs_mirror_list, mirror_node) {
+		cs->fence->error = -EIO;
+		complete_all(&cs->fence->completion);
+	}
+
+	spin_unlock(&hdev->cs_mirror_lock);
+}
+
+void hl_abort_waitings_for_completion(struct hl_device *hdev)
+{
+	force_complete_cs(hdev);
+	force_complete_multi_cs(hdev);
+	hl_release_pending_user_interrupts(hdev);
+}
+
 static void job_wq_completion(struct work_struct *work)
 {
 	struct hl_cs_job *job = container_of(work, struct hl_cs_job,
@@ -3489,14 +3510,15 @@ static int hl_interrupt_wait_ioctl(struct hl_fpriv *hpriv, void *data)
 
 int hl_wait_ioctl(struct hl_fpriv *hpriv, void *data)
 {
+	struct hl_device *hdev = hpriv->hdev;
 	union hl_wait_cs_args *args = data;
 	u32 flags = args->in.flags;
 	int rc;
 
-	/* If the device is not operational, no point in waiting for any command submission or
-	 * user interrupt
+	/* If the device is not operational, or if an error has happened and user should release the
+	 * device, there is no point in waiting for any command submission or user interrupt.
 	 */
-	if (!hl_device_operational(hpriv->hdev, NULL))
+	if (!hl_device_operational(hpriv->hdev, NULL) || hdev->reset_info.watchdog_active)
 		return -EBUSY;
 
 	if (flags & HL_WAIT_CS_FLAGS_INTERRUPT)
diff --git a/drivers/misc/habanalabs/common/device.c b/drivers/misc/habanalabs/common/device.c
index 1453f2ec72d9..92721111b652 100644
--- a/drivers/misc/habanalabs/common/device.c
+++ b/drivers/misc/habanalabs/common/device.c
@@ -1865,6 +1865,8 @@ int hl_device_cond_reset(struct hl_device *hdev, u32 flags, u64 event_mask)
 
 	hl_ctx_put(ctx);
 
+	hl_abort_waitings_for_completion(hdev);
+
 	return 0;
 
 device_reset:
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index 9e42d0e9ce33..7fb45610ad0c 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -3791,6 +3791,7 @@ void hl_dec_fini(struct hl_device *hdev);
 void hl_dec_ctx_fini(struct hl_ctx *ctx);
 
 void hl_release_pending_user_interrupts(struct hl_device *hdev);
+void hl_abort_waitings_for_completion(struct hl_device *hdev);
 int hl_cs_signal_sob_wraparound_handler(struct hl_device *hdev, u32 q_idx,
 			struct hl_hw_sob **hw_sob, u32 count, bool encaps_sig);
 
-- 
2.25.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ