lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240619063425.1377327-5-obitton@habana.ai>
Date: Wed, 19 Jun 2024 09:34:21 +0300
From: Ofir Bitton <obitton@...ana.ai>
To: dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Cc: Tomer Tayar <ttayar@...ana.ai>
Subject: [PATCH 5/9] accel/habanalabs: revise print on EQ heartbeat failure

From: Tomer Tayar <ttayar@...ana.ai>

Don't print the "previous EQ index" value in case of a EQ heartbeat
failure, because it is incremented along with the EQ CI and therefore
redundant.

In addition, as the CPU-CP PI is zeroed when it reaches a value that is
twice the queue size, add a value of the CI with a similar wrap around,
to make it easier to compare the values.

Signed-off-by: Tomer Tayar <ttayar@...ana.ai>
Reviewed-by: Ofir Bitton <obitton@...ana.ai>
---
 drivers/accel/habanalabs/common/device.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common/device.c
index 2fa6bf4c97af..3efc26dd9497 100644
--- a/drivers/accel/habanalabs/common/device.c
+++ b/drivers/accel/habanalabs/common/device.c
@@ -1064,23 +1064,24 @@ static bool is_pci_link_healthy(struct hl_device *hdev)
 
 static bool hl_device_eq_heartbeat_received(struct hl_device *hdev)
 {
+	struct eq_heartbeat_debug_info *heartbeat_debug_info = &hdev->heartbeat_debug_info;
+	u32 cpu_q_id = heartbeat_debug_info->cpu_queue_id, pq_pi_mask = (HL_QUEUE_LENGTH << 1) - 1;
 	struct asic_fixed_properties *prop = &hdev->asic_prop;
-	u32 cpu_q_id;
 
 	if (!prop->cpucp_info.eq_health_check_supported)
 		return true;
 
 	if (!hdev->eq_heartbeat_received) {
-		cpu_q_id = hdev->heartbeat_debug_info.cpu_queue_id;
-
 		dev_err(hdev->dev, "EQ heartbeat event was not received!\n");
 
-		dev_err(hdev->dev, "Heartbeat events counter: %u, Q_PI: %u, Q_CI: %u, EQ CI: %u, EQ prev: %u\n",
-				hdev->heartbeat_debug_info.heartbeat_event_counter,
-				hdev->kernel_queues[cpu_q_id].pi,
-				atomic_read(&hdev->kernel_queues[cpu_q_id].ci),
-				hdev->event_queue.ci,
-				hdev->event_queue.prev_eqe_index);
+		dev_err(hdev->dev,
+			"Heartbeat events counter: %u, EQ CI: %u, PQ PI: %u, PQ CI: %u (%u)\n",
+			heartbeat_debug_info->heartbeat_event_counter,
+			hdev->event_queue.ci,
+			hdev->kernel_queues[cpu_q_id].pi,
+			atomic_read(&hdev->kernel_queues[cpu_q_id].ci),
+			atomic_read(&hdev->kernel_queues[cpu_q_id].ci) & pq_pi_mask);
+
 		return false;
 	}
 
-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ