[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230711213501.526237-1-andrealmeid@igalia.com>
Date: Tue, 11 Jul 2023 18:34:55 -0300
From: André Almeida <andrealmeid@...lia.com>
To: dri-devel@...ts.freedesktop.org, amd-gfx@...ts.freedesktop.org,
linux-kernel@...r.kernel.org
Cc: kernel-dev@...lia.com, alexander.deucher@....com,
christian.koenig@....com, pierre-eric.pelloux-prayer@....com,
'Marek Olšák' <maraeo@...il.com>,
Samuel Pitoiset <samuel.pitoiset@...il.com>,
Bas Nieuwenhuizen <bas@...nieuwenhuizen.nl>,
Timur Kristóf <timur.kristof@...il.com>,
michel.daenzer@...lbox.org,
André Almeida <andrealmeid@...lia.com>
Subject: [PATCH 0/6] drm/amdgpu: Add new reset option and rework coredump
Hi,
The goal of this patchset is to improve debugging device resets on amdgpu.
The first patch creates a new module parameter to disable soft recoveries,
ensuring every recovery go through the full device reset, making easier to
generate resets from userspace tools like [0] and [1]. This is important to
validate how the stack behaves on resets, from end-to-end.
The second patch is a small addition to mark guilty jobs that causes soft
recoveries for API consistency.
The last patches are a rework to store more information at devcoredump files,
making it more useful to be attached to bug reports.
The new coredump content look like this:
**** AMDGPU Device Coredump ****
version: 1
kernel: 6.4.0-rc7-tony+
module: amdgpu
time: 702.743534320
process_name: vulkan-triangle PID: 4561
IBs:
[0] 0xffff800100545000
[1] 0xffff800100001000
ring name: gfx_0.0.0
Due to nested IBs, this may not be the one that really caused the hang, but it
gives some direction.
Thanks,
André
[0] https://gitlab.freedesktop.org/andrealmeid/gpu-timeout
[1] https://github.com/andrealmeid/vulkan-triangle-v1
André Almeida (6):
drm/amdgpu: Create a module param to disable soft recovery
drm/amdgpu: Mark contexts guilty for causing soft recoveries
drm/amdgpu: Rework coredump to use memory dynamically
drm/amdgpu: Limit info in coredump for kernel threads
drm/amdgpu: Log IBs and ring name at coredump
drm/amdgpu: Create version number for coredumps
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 21 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 6 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 99 +++++++++++++++++-----
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +-
5 files changed, 112 insertions(+), 29 deletions(-)
--
2.41.0
Powered by blists - more mailing lists