[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260210091417.1133-1-pierre-eric.pelloux-prayer@amd.com>
Date: Tue, 10 Feb 2026 10:14:16 +0100
From: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@....com>
To: Alex Deucher <alexander.deucher@....com>,
Christian König <christian.koenig@....com>, David Airlie
<airlied@...il.com>, Simona Vetter <simona@...ll.ch>
CC: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@....com>, "Simona
Vetter" <simona.vetter@...ll.ch>, <amd-gfx@...ts.freedesktop.org>,
<dri-devel@...ts.freedesktop.org>, <linux-kernel@...r.kernel.org>
Subject: [PATCH v1] drm/amdgpu: fix sync handling in amdgpu_dma_buf_move_notify
Invalidating a dmabuf will impact other users of the shared BO.
In the scenario where process A moves the BO, it needs to inform
process B about the move and process B will need to update its
page table.
The commit fixes a synchronisation bug caused by the use of the
ticket: it made amdgpu_vm_handle_moved behave as if updating
the page table immediately was correct but in this case it's not.
An example is the following scenario, with 2 GPUs and glxgears
running on GPU0 and Xorg running on GPU1, on a system where P2P
PCI isn't supported:
glxgears:
export linear buffer from GPU0 and import using GPU1
submit frame rendering to GPU0
submit tiled->linear blit
Xorg:
copy of linear buffer
The sequence of jobs would be:
drm_sched_job_run # GPU0, frame rendering
drm_sched_job_queue # GPU0, blit
drm_sched_job_done # GPU0, frame rendering
drm_sched_job_run # GPU0, blit
move linear buffer for GPU1 access #
amdgpu_dma_buf_move_notify -> update pt # GPU0
It this point the blit job on GPU0 is still running and would
likely produce a page fault.
Fixes: a448cb003edc ("drm/amdgpu: implement amdgpu_gem_prime_move_notify v2")
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@....com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index b9c38a4fe546..656c267dbe58 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -514,8 +514,15 @@ amdgpu_dma_buf_move_notify(struct dma_buf_attachment *attach)
r = dma_resv_reserve_fences(resv, 2);
if (!r)
r = amdgpu_vm_clear_freed(adev, vm, NULL);
+
+ /* Don't pass 'ticket' to amdgpu_vm_handle_moved: we want the clear=true
+ * path to be used otherwise we might update the PT of another process
+ * while it's using the BO.
+ * With clear=true, amdgpu_vm_bo_update will sync to command submission
+ * from the same VM.
+ */
if (!r)
- r = amdgpu_vm_handle_moved(adev, vm, ticket);
+ r = amdgpu_vm_handle_moved(adev, vm, NULL);
if (r && r != -EBUSY)
DRM_ERROR("Failed to invalidate VM page tables (%d))\n",
--
2.43.0
Powered by blists - more mailing lists