lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c3c9787d-b279-8169-43d1-74eeb666ffbd@gmail.com>
Date:   Mon, 30 Apr 2018 14:08:44 +0200
From:   Christian König <ckoenig.leichtzumerken@...il.com>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>,
        Andrey Grodzovsky <Andrey.Grodzovsky@....com>
Cc:     David.Panariti@....com, Oleg Nesterov <oleg@...hat.com>,
        amd-gfx@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
        Alexander.Deucher@....com, akpm@...ux-foundation.org,
        Christian.Koenig@....com
Subject: Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for
 signaled process.

Hi Eric,

sorry for the late response, was on vacation last week.

Am 26.04.2018 um 02:01 schrieb Eric W. Biederman:
> Andrey Grodzovsky <Andrey.Grodzovsky@....com> writes:
>
>> On 04/25/2018 01:17 PM, Oleg Nesterov wrote:
>>> On 04/25, Andrey Grodzovsky wrote:
>>>> here (drm_sched_entity_fini) is also a bad idea, but we still want to be
>>>> able to exit immediately
>>>> and not wait for GPU jobs completion when the reason for reaching this code
>>>> is because of KILL
>>>> signal to the user process who opened the device file.
>>> Can you hook f_op->flush method?

THANKS! That sounds like a really good idea to me and we haven't 
investigated into that direction yet.

>> But this one is called for each task releasing a reference to the the file, so
>> not sure I see how this solves the problem.
> The big question is why do you need to wait during the final closing a
> file?

As always it's because of historical reasons. Initially user space 
pushed commands directly to a hardware queue and when a processes 
finished we didn't need to wait for anything.

Then the GPU scheduler was introduced which delayed pushing the jobs to 
the hardware queue to a later point in time.

This wait was then added to maintain backward compability and not break 
userspace (but see below).

> The wait can be terminated so the wait does not appear to be simply a
> matter of correctness.

Well when the process is killed we don't care about correctness any 
more, we just want to get rid of it as quickly as possible (OOM 
situation etc...).

But it is perfectly possible that a process submits some render commands 
and then calls exit() or terminates because of a SIGTERM, SIGINT etc.. 
In this case we need to wait here to make sure that all rendering is 
pushed to the hardware because the scheduler might need 
resources/settings from the file descriptor.

For example if you just remove that wait you could close firefox and get 
garbage on the screen for a millisecond because the remaining rendering 
commands where not executed.

So what we essentially need is to distinct between a SIGKILL (which 
means stop processing as soon as possible) and any other reason because 
then we don't want to annoy the user with garbage on the screen (even if 
it's just for a few milliseconds).

Constructive ideas how to handle this would be very welcome, cause I 
completely agree that what we have at the moment by checking PF_SIGNAL 
is just a very very hacky workaround.

Thanks,
Christian.

>
> Eric
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@...ts.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ