linux-kernel - Re: [PATCH 2/3] drm/scheduler: Don't call wait_event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 2 May 2018 13:48:31 +0200
From:   Christian König <christian.koenig@....com>
To:     Andrey Grodzovsky <Andrey.Grodzovsky@....com>,
        Oleg Nesterov <oleg@...hat.com>
Cc:     David.Panariti@....com, linux-kernel@...r.kernel.org,
        amd-gfx@...ts.freedesktop.org,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Alexander.Deucher@....com, akpm@...ux-foundation.org
Subject: Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for
 signaled process.

Am 30.04.2018 um 21:28 schrieb Andrey Grodzovsky:
>
>
> On 04/30/2018 02:29 PM, Christian König wrote:
>> Am 30.04.2018 um 18:10 schrieb Andrey Grodzovsky:
>>>
>>>
>>> On 04/30/2018 12:00 PM, Oleg Nesterov wrote:
>>>> On 04/30, Andrey Grodzovsky wrote:
>>>>> What about changing PF_SIGNALED to PF_EXITING in
>>>>> drm_sched_entity_do_release
>>>>>
>>>>> -       if ((current->flags & PF_SIGNALED) && current->exit_code 
>>>>> == SIGKILL)
>>>>> +      if ((current->flags & PF_EXITING) && current->exit_code == 
>>>>> SIGKILL)
>>>> let me repeat, please don't use task->exit_code. And in fact this 
>>>> check is racy
>>>>
>>>> But this doesn't matter. Say, we can trivially add 
>>>> SIGNAL_GROUP_KILLED_BY_SIGKILL,
>>>> or do something else,
>>>
>>>
>>> Can you explain where is the race and what is a possible alternative 
>>> then ?
>>
>> The race is that the release doesn't necessarily comes from the 
>> process/context which used the fd.
>>
>> E.g. it is just called when the last reference count goes away, but 
>> that can be anywhere not related to the original process using it, 
>> e.g. in a kernel thread or a debugger etc...
>
> I still don't see how it is a problem, if release comes from another 
> task, then our process  (let's say Firefox who received SIGKILL) won't 
> even get here since fput will not call .release so it will die instantly,

And exactly that's the problem. We would then just block whatever task 
is the last one to release the fd.

> the last process who holds the reference (let's say the debugger) when 
> finish will just go to wait_event_timeout and wait for SW queue to be 
> empty from jobs (if any). So all the jobs will have their chance to 
> get to HW anyway.

Yeah, but that's exactly what we want to avoid when the process is 
killed by a signal.

>> The approach with the flush is indeed a really nice idea and I bite 
>> myself to not had that previously as well.
>
> Regarding your request from another email to investigate more on .flush
>
> Looked at the code and did some reading -
>
> From LDD3
> "The flush operation is invoked when a process closes its copy of a 
> file descriptor for a device; it should execute (and wait for) any 
> outstanding operations on the device"

Sounds exactly like what we need.

>
> From printing back trace from dummy .flush hook in our driver -
>
> Normal exit (process terminates on it's own)
>
> [  295.586130 <    0.000006>]  dump_stack+0x5c/0x78
> [  295.586273 <    0.000143>]  my_flush+0xa/0x10 [amdgpu]
> [  295.586283 <    0.000010>]  filp_close+0x4a/0x90
> [  295.586288 <    0.000005>]  SyS_close+0x2d/0x60
> [  295.586295 <    0.000003>]  do_syscall_64+0xee/0x270
>
> Exit triggered by fatal signal (not handled  signal, including SIGKILL)
>
> [  356.551456 <    0.000008>]  dump_stack+0x5c/0x78
> [  356.551592 <    0.000136>]  my_flush+0xa/0x10 [amdgpu]
> [  356.551597 <    0.000005>]  filp_close+0x4a/0x90
> [  356.551605 <    0.000008>]  put_files_struct+0xaf/0x120
> [  356.551615 <    0.000010>]  do_exit+0x468/0x1280
> [  356.551669 <    0.000009>]  do_group_exit+0x89/0x140
> [  356.551679 <    0.000010>]  get_signal+0x375/0x8f0
> [  356.551696 <    0.000017>]  do_signal+0x79/0xaa0
> [  356.551756 <    0.000014>] exit_to_usermode_loop+0x83/0xd0
> [  356.551764 <    0.000008>]  do_syscall_64+0x244/0x270
>
> So as it was said here before, it will be called for every process 
> closing his FD to the file.
>
> But again, I don't quire see yet what we earn by using .flush, is it 
> that you force every process holding reference to DRM file not
> die until all jobs are submitted to HW (as long as the process not 
> being killed by  a signal) ?

If in your example firefox dies by a fatal signal we won't notice that 
when somebody else is holding the last reference. E.g. we would still 
try to submit the jobs to the hardware.

I suggest the following approach:
1. Implement the flush callback and call the function to wait for the 
scheduler to push everything to the hardware (maybe rename the scheduler 
function to flush as well).

2. Change the scheduler to test for PF_EXITING, if it's set use 
wait_event_timeout() if it isn't set use wait_event_killable().

When the wait times out or is killed set a flag so that the _fini 
function knows that. Alternatively you could cleanup the _fini function 
to work in all cases, e.g. both when there are still jobs on the queue 
and when the queue is empty. For this you need to add something like a 
struct completion to the main loop to remove this start()/stop() of the 
kernel thread.

Christian.

>
> Andrey