[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220122195731.934494-17-ogabbay@kernel.org>
Date: Sat, 22 Jan 2022 21:57:18 +0200
From: Oded Gabbay <ogabbay@...nel.org>
To: linux-kernel@...r.kernel.org
Subject: [PATCH 17/30] habanalabs: there is no kernel TDR in future ASICs
In future ASICs, there is no kernel TDR for new workloads that are
submitted directly from user-space to the device.
Therefore, the driver can NEVER know that a workload has timed-out.
So, when the user asks us to wait for interrupt on the workload's
completion, and the wait has timed-out, it doesn't mean the workload
has timed-out. It only means the wait has timed-out, which is NOT an
error from driver's perspective.
Signed-off-by: Oded Gabbay <ogabbay@...nel.org>
---
.../misc/habanalabs/common/command_submission.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c
index 96ec1fd3882a..0f41c283082c 100644
--- a/drivers/misc/habanalabs/common/command_submission.c
+++ b/drivers/misc/habanalabs/common/command_submission.c
@@ -2941,11 +2941,14 @@ static int _hl_interrupt_wait_ioctl(struct hl_device *hdev, struct hl_ctx *ctx,
rc = -EIO;
*status = HL_WAIT_CS_STATUS_ABORTED;
} else {
- dev_err_ratelimited(hdev->dev, "Waiting for interrupt ID %d timedout\n",
- interrupt->interrupt_id);
- rc = -ETIMEDOUT;
+ /* The wait has timed-out. We don't know anything beyond that
+ * because the workload wasn't submitted through the driver.
+ * Therefore, from driver's perspective, the workload is still
+ * executing.
+ */
+ rc = 0;
+ *status = HL_WAIT_CS_STATUS_BUSY;
}
- *status = HL_WAIT_CS_STATUS_BUSY;
}
}
@@ -3058,6 +3061,12 @@ static int _hl_interrupt_wait_ioctl_user_addr(struct hl_device *hdev, struct hl_
interrupt->interrupt_id);
rc = -EINTR;
} else {
+ /* The wait has timed-out. We don't know anything beyond that
+ * because the workload wasn't submitted through the driver.
+ * Therefore, from driver's perspective, the workload is still
+ * executing.
+ */
+ rc = 0;
*status = HL_WAIT_CS_STATUS_BUSY;
}
--
2.25.1
Powered by blists - more mailing lists