[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20211226124304.237107-2-ogabbay@kernel.org>
Date:   Sun, 26 Dec 2021 14:43:01 +0200
From:   Oded Gabbay <ogabbay@...nel.org>
To:     linux-kernel@...r.kernel.org
Cc:     Ohad Sharabi <osharabi@...ana.ai>
Subject: [PATCH 2/5] habanalabs: handle skip multi-CS if handling not done
From: Ohad Sharabi <osharabi@...ana.ai>
This patch fixes issue in which we have timeout for multi-CS although
the CS in the list actually completed.
Example scenario (the two threads marked as WAIT for the thread that
handles the wait_for_multi_cs and CMPL as the thread that signal
completion for both CS and multi-CS):
1. Submit CS with sequence X
2. [WAIT]: call wait_for_multi_cs with single CS X
3. [CMPL]: CS X do invoke complete_all for both CS and multi-CS
           (multi_cs_completion_done still false)
4. [WAIT]: enter poll_fences, reinit the completion and find the CS
           as completed when asking on the fence but multi_cs_done is
	   still false it returns that no CS actually completed
5. [CMPL]: set multi_cs_handling_done as true
6. [WAIT]: wait for completion but no CS to awake the wait context
           and hence wait till timeout
Solution: if CS detected as completed in poll_fences but multi_cs_done
          is still false invoke complete_all to the multi-CS completion
	  and so it will not go to sleep in wait_for_completion but
	  rather will have a "second chance" to wait for
	  multi_cs_completion_done.
Signed-off-by: Ohad Sharabi <osharabi@...ana.ai>
Reviewed-by: Oded Gabbay <ogabbay@...nel.org>
Signed-off-by: Oded Gabbay <ogabbay@...nel.org>
---
 drivers/misc/habanalabs/common/command_submission.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/misc/habanalabs/common/command_submission.c b/drivers/misc/habanalabs/common/command_submission.c
index 7073fa6b9f0f..d39343f90bc2 100644
--- a/drivers/misc/habanalabs/common/command_submission.c
+++ b/drivers/misc/habanalabs/common/command_submission.c
@@ -2453,9 +2453,19 @@ static int hl_cs_poll_fences(struct multi_cs_data *mcs_data, struct multi_cs_com
 			 * returns to user indicating CS completed before it finished
 			 * all of its mcs handling, to avoid race the next time the
 			 * user waits for mcs.
+			 * note: when reaching this case fence is definitely not NULL
+			 *       but NULL check was added to overcome static analysis
 			 */
-			if (!fence->mcs_handling_done)
+			if (fence && !fence->mcs_handling_done) {
+				/*
+				 * in case multi CS is completed but MCS handling not done
+				 * we "complete" the multi CS to prevent it from waiting
+				 * until time-out and the "multi-CS handling done" will have
+				 * another chance at the next iteration
+				 */
+				complete_all(&mcs_compl->completion);
 				break;
+			}
 
 			mcs_data->completion_bitmap |= BIT(i);
 			/*
-- 
2.25.1
Powered by blists - more mailing lists
 
