linux-kernel - Re: [PATCH 1/3] drm/scheduler: Add more documentation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d2e748e3-0263-70ed-0f6a-278441379371@asahilina.net>
Date:   Fri, 14 Jul 2023 18:51:32 +0900
From:   Asahi Lina <lina@...hilina.net>
To:     Christian König <christian.koenig@....com>,
        Luben Tuikov <luben.tuikov@....com>,
        David Airlie <airlied@...il.com>,
        Daniel Vetter <daniel@...ll.ch>,
        Sumit Semwal <sumit.semwal@...aro.org>
Cc:     Faith Ekstrand <faith.ekstrand@...labora.com>,
        Alyssa Rosenzweig <alyssa@...enzweig.io>,
        dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
        linux-media@...r.kernel.org, asahi@...ts.linux.dev
Subject: Re: [PATCH 1/3] drm/scheduler: Add more documentation

On 14/07/2023 18.47, Christian König wrote:
> Am 14.07.23 um 11:39 schrieb Asahi Lina:
>> On 14/07/2023 17.40, Christian König wrote:
>>> Am 14.07.23 um 10:21 schrieb Asahi Lina:
>>>> Document the implied lifetime rules of the scheduler (or at least the
>>>> intended ones), as well as the expectations of how resource acquisition
>>>> should be handled.
>>>>
>>>> Signed-off-by: Asahi Lina <lina@...hilina.net>
>>>> ---
>>>>     drivers/gpu/drm/scheduler/sched_main.c | 58
>>>> ++++++++++++++++++++++++++++++++--
>>>>     1 file changed, 55 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>> index 7b2bfc10c1a5..1f3bc3606239 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>> @@ -43,9 +43,61 @@
>>>>      *
>>>>      * The jobs in a entity are always scheduled in the order that
>>>> they were pushed.
>>>>      *
>>>> - * Note that once a job was taken from the entities queue and
>>>> pushed to the
>>>> - * hardware, i.e. the pending queue, the entity must not be
>>>> referenced anymore
>>>> - * through the jobs entity pointer.
>>>> + * Lifetime rules
>>>> + * --------------
>>>> + *
>>>> + * Getting object lifetimes right across the stack is critical to
>>>> avoid UAF
>>>> + * issues. The DRM scheduler has the following lifetime rules:
>>>> + *
>>>> + * - The scheduler must outlive all of its entities.
>>>> + * - Jobs pushed to the scheduler are owned by it, and must only be
>>>> freed
>>>> + *   after the free_job() callback is called.
>>>> + * - Scheduler fences are reference-counted and may outlive the
>>>> scheduler.
>>>
>>>> + * - The scheduler *may* be destroyed while jobs are still in flight.
>>>
>>> That's not correct. The scheduler can only be destroyed after all the
>>> entities serving it have been destroyed as well as all the jobs already
>>> pushed to the hw finished.
>>
>> The point of this series is to change this behavior so I can actually
>> use the scheduler in my use case, and that begins with formally
>> documenting it as Daniel suggested. That is, I need it to be safe for
>> jobs to not be yet complete before the scheduler is destroyed (the
>> entities do get destroyed first, that's the first bullet point).
> 
> Yeah, but you need to document the current situation not how you like it
> to be.

Daniel told me to document how I think it should be, then fix the bugs 
that make it not so. That's what this series does.

>> We already had this discussion. Without this guarantee, I cannot build
>> a reasonable safe Rust abstraction. Unless you have another
>> suggestion, as far as I can tell it's either this or I give up on
>> using the DRM scheduler entirely and reimplement something else on my
>> own.
>>
>>> What might be possible to add is that the hw is still working on the
>>> already pushed jobs, but so far that was rejected as undesirable.
>>
>> Where was this rejected?
> 
> Years ago. Our initial driver suspend/resume design relied on that.
> Turned out not to be a good idea

Times change, maybe it's time to revisit that decision?

~~ Lina