[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250601232937.3510379-8-sashal@kernel.org>
Date: Sun, 1 Jun 2025 19:28:00 -0400
From: Sasha Levin <sashal@...nel.org>
To: patches@...ts.linux.dev,
stable@...r.kernel.org
Cc: "Jesse.Zhang" <Jesse.Zhang@....com>,
Jesse Zhang <jesse.zhang@....com>,
"Shaoyun . liu" <Shaoyun.liu@....com>,
Prike Liang <Prike.Liang@....com>,
Alex Deucher <alexander.deucher@....com>,
Sasha Levin <sashal@...nel.org>,
christian.koenig@....com,
airlied@...il.com,
simona@...ll.ch,
sunil.khatri@....com,
Jack.Xiao@....com,
srinivasan.shanmugam@....com,
shaoyun.liu@....com,
Jiadong.Zhu@....com,
Hawking.Zhang@....com,
michael.chen@....com,
amd-gfx@...ts.freedesktop.org,
dri-devel@...ts.freedesktop.org,
linux-kernel@...r.kernel.org
Subject: [PATCH AUTOSEL 6.14 008/102] drm/amdgpu: Fix API status offset for MES queue reset
From: "Jesse.Zhang" <Jesse.Zhang@....com>
[ Upstream commit ad7c088e31f026d71fe87fd09473fafb7d6ed006 ]
The mes_v11_0_reset_hw_queue and mes_v12_0_reset_hw_queue functions were
using the wrong union type (MESAPI__REMOVE_QUEUE) when getting the offset
for api_status. Since these functions handle queue reset operations, they
should use MESAPI__RESET union instead.
This fixes the polling of API status during hardware queue reset operations
in the MES for both v11 and v12 versions.
Signed-off-by: Jesse Zhang <jesse.zhang@....com>
Reviewed-By: Shaoyun.liu <Shaoyun.liu@....com>
Reviewed-by: Prike Liang <Prike.Liang@....com>
Signed-off-by: Alex Deucher <alexander.deucher@....com>
Signed-off-by: Sasha Levin <sashal@...nel.org>
---
**YES** This commit should be backported to stable kernel trees based on
the following analysis: ## Bug Analysis The commit fixes a clear
functional bug where `mes_v11_0_reset_hw_queue()` and
`mes_v12_0_reset_hw_queue()` functions were using the wrong union type
(`MESAPI__REMOVE_QUEUE`) to get the API status offset for polling
completion. Since these functions handle **reset** operations, they
should use the `MESAPI__RESET` union instead. ## Code Changes Assessment
1. **Small and contained**: The fix changes only 2 lines across 2 files:
- `mes_v11_0.c:480`: `offsetof(union MESAPI__REMOVE_QUEUE, api_status)`
→ `offsetof(union MESAPI__RESET, api_status)` - `mes_v12_0.c:500`: Same
change 2. **Logical correctness**: The functions create `union
MESAPI__RESET mes_reset_queue_pkt` packets but were polling using the
wrong union offset. This is a clear mismatch that could cause incorrect
polling behavior. 3. **No architectural changes**: This is purely a bug
fix that aligns the API status polling with the correct packet type. ##
Impact Assessment - **Affects GPU queue management**: MES (Micro Engine
Scheduler) queue reset is a critical operation for AMD GPUs -
**Potential for silent failures**: Wrong offset polling could lead to
improper completion detection - **User-visible impact**: Could cause GPU
hangs or incorrect queue state management - **Low regression risk**: The
fix aligns the code with its intended design ## Comparison with Similar
Commits All 5 provided similar commits were marked "NO" for backporting
because they: - Added new APIs/features (commits #1, #2) - Made
functional changes to queue reset behavior (commits #3, #4, #5) In
contrast, this commit: - Fixes an existing bug without adding features -
Maintains existing behavior while correcting implementation - Follows
the stable tree rule of "important bugfixes with minimal risk" ## Stable
Tree Criteria Met ✅ **Important bugfix**: Fixes incorrect API status
polling in GPU queue reset ✅ **Minimal risk**: Two-line change with
clear logic ✅ **No new features**: Pure correctness fix ✅ **Contained
change**: Limited to MES driver subsystem ✅ **Clear side effects**:
Improves reliability of queue reset operations This commit represents
exactly the type of focused bug fix that stable trees are designed to
include.
drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 2 +-
drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 68bb334393bb6..ec7ef8763f932 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -477,7 +477,7 @@ static int mes_v11_0_reset_hw_queue(struct amdgpu_mes *mes,
return mes_v11_0_submit_pkt_and_poll_completion(mes,
&mes_reset_queue_pkt, sizeof(mes_reset_queue_pkt),
- offsetof(union MESAPI__REMOVE_QUEUE, api_status));
+ offsetof(union MESAPI__RESET, api_status));
}
static int mes_v11_0_map_legacy_queue(struct amdgpu_mes *mes,
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index 6b121c2723d66..53d059a2a42e0 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -500,7 +500,7 @@ static int mes_v12_0_reset_hw_queue(struct amdgpu_mes *mes,
return mes_v12_0_submit_pkt_and_poll_completion(mes, pipe,
&mes_reset_queue_pkt, sizeof(mes_reset_queue_pkt),
- offsetof(union MESAPI__REMOVE_QUEUE, api_status));
+ offsetof(union MESAPI__RESET, api_status));
}
static int mes_v12_0_map_legacy_queue(struct amdgpu_mes *mes,
--
2.39.5
Powered by blists - more mailing lists