linux-kernel - Re: [PATCH 2/3] drm/scheduler: Don't call wait_event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <87y3hbtfgu.fsf@xmission.com>
Date:   Wed, 25 Apr 2018 11:31:45 -0500
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Andrey Grodzovsky <Andrey.Grodzovsky@....com>
Cc:     David.Panariti@....com,
        Michel Dänzer <michel@...nzer.net>,
        linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org,
        oleg@...hat.com, amd-gfx@...ts.freedesktop.org,
        Alexander.Deucher@....com, akpm@...ux-foundation.org,
        Christian.Koenig@....com
Subject: Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process.

Andrey Grodzovsky <Andrey.Grodzovsky@....com> writes:

> On 04/25/2018 11:29 AM, Eric W. Biederman wrote:
>
>>  Another issue is changing wait_event_killable to wait_event_timeout where I need
>> to understand
>> what TO value is acceptable for all the drivers using the scheduler, or maybe it
>> should come as a property
>> of drm_sched_entity.
>>
>> It would not surprise me if you could pick a large value like 1 second
>> and issue a warning if that time outever triggers.  It sounds like the
>> condition where we wait indefinitely today is because something went
>> wrong in the driver.
>
> We wait here for all GPU jobs in flight which belong to the dying entity to complete. The driver submits
> the GPU jobs but the content of the job might be is not under driver's control and could take 
> long time to finish or even hang (e.g. graphic or compute shader) , I
> guess that why originally the wait is indefinite.


I am ignorant of what user space expect or what the semantics of the
susbsystem are here, so I might be completely off base.  But this wait
for a long time behavior I would expect much more from f_op->flush or a
f_op->fsync method.

fsync so it could be obtained without closing the file descriptor.
flush so that you could get a return value out to close.

But I honestly don't know semantically what your userspace applications
expect and/or require so I can really only say.  Those of weird semantics.

Eric