[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230401014242.3356780-22-sashal@kernel.org>
Date: Fri, 31 Mar 2023 21:42:38 -0400
From: Sasha Levin <sashal@...nel.org>
To: linux-kernel@...r.kernel.org, stable@...r.kernel.org
Cc: YuBiao Wang <YuBiao.Wang@....com>,
Luben Tuikov <luben.tuikov@....com>,
Alex Deucher <alexander.deucher@....com>,
Sasha Levin <sashal@...nel.org>, christian.koenig@....com,
Xinhui.Pan@....com, airlied@...il.com, daniel@...ll.ch,
andrey.grodzovsky@....com, Jiadong.Zhu@....com,
gpiccoli@...lia.com, Jack.Xiao@....com, Likun.Gao@....com,
slark_xiao@....com, amd-gfx@...ts.freedesktop.org,
dri-devel@...ts.freedesktop.org
Subject: [PATCH AUTOSEL 6.1 22/24] drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs
From: YuBiao Wang <YuBiao.Wang@....com>
[ Upstream commit 033c56474acf567a450f8bafca50e0b610f2b716 ]
[Why]
For engines not supporting soft reset, i.e. VCN, there will be a failed
ib test before mode 1 reset during asic reset. The fences in this case
are never signaled and next time when we try to free the sa_bo, kernel
will hang.
[How]
During pre_asic_reset, driver will clear job fences and afterwards the
fences' refcount will be reduced to 1. For drm_sched_jobs it will be
released in job_free_cb, and for non-sched jobs like ib_test, it's meant
to be released in sa_bo_free but only when the fences are signaled. So
we have to force signal the non_sched bad job's fence during
pre_asic_reset or the clear is not complete.
Signed-off-by: YuBiao Wang <YuBiao.Wang@....com>
Acked-by: Luben Tuikov <luben.tuikov@....com>
Signed-off-by: Alex Deucher <alexander.deucher@....com>
Signed-off-by: Sasha Levin <sashal@...nel.org>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 6fdb679321d0d..3cc1929285fc0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -624,6 +624,15 @@ void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring)
ptr = &ring->fence_drv.fences[i];
old = rcu_dereference_protected(*ptr, 1);
if (old && old->ops == &amdgpu_job_fence_ops) {
+ struct amdgpu_job *job;
+
+ /* For non-scheduler bad job, i.e. failed ib test, we need to signal
+ * it right here or we won't be able to track them in fence_drv
+ * and they will remain unsignaled during sa_bo free.
+ */
+ job = container_of(old, struct amdgpu_job, hw_fence);
+ if (!job->base.s_fence && !dma_fence_is_signaled(old))
+ dma_fence_signal(old);
RCU_INIT_POINTER(*ptr, NULL);
dma_fence_put(old);
}
--
2.39.2
Powered by blists - more mailing lists