lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1524583836-12130-1-git-send-email-andrey.grodzovsky@amd.com>
Date:   Tue, 24 Apr 2018 11:30:33 -0400
From:   Andrey Grodzovsky <andrey.grodzovsky@....com>
To:     <linux-kernel@...r.kernel.org>, <amd-gfx@...ts.freedesktop.org>
CC:     <Alexander.Deucher@....com>, <Christian.Koenig@....com>,
        <David.Panariti@....com>, <oleg@...hat.com>,
        <akpm@...ux-foundation.org>, <ebiederm@...ssion.com>
Subject: Avoid uninterruptible sleep during process exit

Following 3 patches address an issue we encounter in AMDGPU driver.

When GPU pipe is stalling for some reason (shader code error, incorrectly programmed registers e.t.c...) 
uninterruptible wait in kernel puts the user process in unresponsive state 
which only can be remedied by  system's hard reset.   

Each patch addresses a different use case of such problem.

First one is normal exit (not from signal processing) the change in 
core/signal.c - to allow propagation of KILL signal to process marked as exiting.

Second one is exit due to death because of unhanded  signal during signal 
processing - to avoid waiting for SIGKILL if you are called from
...->do_signal->get_signal->do_group_exit->do_exit->...->wait_event_killable

Third one is nor related to process exit and just avoids uninterruptible wait 
for particular job completion on the GPU pipe.

P.S Sending this to the kernel mailing list mainly because of the first patch, 
the 2 others are intended more for amd-gfx@...ts.freedesktop.org and 
are given here just to provide more context for the problem we try to solve.

Andrey Grodzovsky (3):

signals: Allow generation of SIGKILL to exiting task.   
drm/scheduler: Don't call wait_event_killable for signaled process.   
drm/amdgpu: Switch to interrupted wait to recover from ring hang.

drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 14 ++++++++++----
drivers/gpu/drm/scheduler/gpu_scheduler.c |  5 +++--
kernel/signal.c                           |  4 ++--
3 files changed, 15 insertions(+), 8 deletions(-)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ