[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1524583836-12130-1-git-send-email-andrey.grodzovsky@amd.com>
Date: Tue, 24 Apr 2018 11:30:33 -0400
From: Andrey Grodzovsky <andrey.grodzovsky@....com>
To: <linux-kernel@...r.kernel.org>, <amd-gfx@...ts.freedesktop.org>
CC: <Alexander.Deucher@....com>, <Christian.Koenig@....com>,
<David.Panariti@....com>, <oleg@...hat.com>,
<akpm@...ux-foundation.org>, <ebiederm@...ssion.com>
Subject: Avoid uninterruptible sleep during process exit
Following 3 patches address an issue we encounter in AMDGPU driver.
When GPU pipe is stalling for some reason (shader code error, incorrectly programmed registers e.t.c...)
uninterruptible wait in kernel puts the user process in unresponsive state
which only can be remedied by system's hard reset.
Each patch addresses a different use case of such problem.
First one is normal exit (not from signal processing) the change in
core/signal.c - to allow propagation of KILL signal to process marked as exiting.
Second one is exit due to death because of unhanded signal during signal
processing - to avoid waiting for SIGKILL if you are called from
...->do_signal->get_signal->do_group_exit->do_exit->...->wait_event_killable
Third one is nor related to process exit and just avoids uninterruptible wait
for particular job completion on the GPU pipe.
P.S Sending this to the kernel mailing list mainly because of the first patch,
the 2 others are intended more for amd-gfx@...ts.freedesktop.org and
are given here just to provide more context for the problem we try to solve.
Andrey Grodzovsky (3):
signals: Allow generation of SIGKILL to exiting task.
drm/scheduler: Don't call wait_event_killable for signaled process.
drm/amdgpu: Switch to interrupted wait to recover from ring hang.
drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 14 ++++++++++----
drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 +++--
kernel/signal.c | 4 ++--
3 files changed, 15 insertions(+), 8 deletions(-)
Powered by blists - more mailing lists