linux-kernel - Re: [PATCH 2/3] drm/scheduler: Don't call wait_event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8840ac96-50c4-f94d-eb7c-f007940163f3@amd.com>
Date:   Tue, 24 Apr 2018 12:43:28 -0400
From:   Andrey Grodzovsky <Andrey.Grodzovsky@....com>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     linux-kernel@...r.kernel.org, amd-gfx@...ts.freedesktop.org,
        Alexander.Deucher@....com, Christian.Koenig@....com,
        David.Panariti@....com, oleg@...hat.com, akpm@...ux-foundation.org
Subject: Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for
 signaled process.



On 04/24/2018 12:23 PM, Eric W. Biederman wrote:
> Andrey Grodzovsky <andrey.grodzovsky@....com> writes:
>
>> Avoid calling wait_event_killable when you are possibly being called
>> from get_signal routine since in that case you end up in a deadlock
>> where you are alreay blocked in singla processing any trying to wait
>> on a new signal.
> I am curious what the call path that is problematic here.

Here is the problematic call stack

[<0>] drm_sched_entity_fini+0x10a/0x3a0 [gpu_sched]
[<0>] amdgpu_ctx_do_release+0x129/0x170 [amdgpu]
[<0>] amdgpu_ctx_mgr_fini+0xd5/0xe0 [amdgpu]
[<0>] amdgpu_driver_postclose_kms+0xcd/0x440 [amdgpu]
[<0>] drm_release+0x414/0x5b0 [drm]
[<0>] __fput+0x176/0x350
[<0>] task_work_run+0xa1/0xc0
[<0>] do_exit+0x48f/0x1280
[<0>] do_group_exit+0x89/0x140
[<0>] get_signal+0x375/0x8f0
[<0>] do_signal+0x79/0xaa0
[<0>] exit_to_usermode_loop+0x83/0xd0
[<0>] do_syscall_64+0x244/0x270
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff

On exit from system call you process all the signals you received and 
encounter a fatal signal which triggers process termination.

>
> In general waiting seems wrong when the process has already been
> fatally killed as indicated by PF_SIGNALED.

So indeed this patch avoids wait in this case.

>
> Returning -ERESTARTSYS seems wrong as nothing should make it back even
> to the edge of userspace here.

Can you clarify please - what should be returned here instead ?

Andrey

>
> Given that this is the only use of PF_SIGNALED outside of bsd process
> accounting I find this code very suspicious.
>
> It looks the code path that gets called during exit is buggy and needs
> to be sorted out.
>
> Eric
>
>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@....com>
>> ---
>>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>> index 088ff2b..09fd258 100644
>> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>> @@ -227,9 +227,10 @@ void drm_sched_entity_do_release(struct drm_gpu_scheduler *sched,
>>   		return;
>>   	/**
>>   	 * The client will not queue more IBs during this fini, consume existing
>> -	 * queued IBs or discard them on SIGKILL
>> +	 * queued IBs or discard them when in death signal state since
>> +	 * wait_event_killable can't receive signals in that state.
>>   	*/
>> -	if ((current->flags & PF_SIGNALED) && current->exit_code == SIGKILL)
>> +	if (current->flags & PF_SIGNALED)
>>   		entity->fini_status = -ERESTARTSYS;
>>   	else
>>   		entity->fini_status = wait_event_killable(sched->job_scheduled,