[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5438c132-e127-4456-9551-42c76fb521dd@amd.com>
Date:   Thu, 2 Nov 2023 11:48:25 +0100
From:   Christian König <christian.koenig@....com>
To:     Daniel Vetter <daniel@...ll.ch>, Dave Airlie <airlied@...il.com>
Cc:     Asahi Lina <lina@...hilina.net>, alyssa@...enzweig.io,
        Sumit Semwal <sumit.semwal@...aro.org>,
        Faith Ekstrand <faith.ekstrand@...labora.com>,
        dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
        linux-media@...r.kernel.org, asahi@...ts.linux.dev,
        Luben Tuikov <ltuikov89@...il.com>
Subject: Re: [PATCH 2/3] drm/scheduler: Fix UAF in
 drm_sched_fence_get_timeline_name
Am 01.11.23 um 09:13 schrieb Daniel Vetter:
> On Wed, 1 Nov 2023 at 07:59, Dave Airlie <airlied@...il.com> wrote:
>>> Well, to make it clear once more: Signaling a dma_fence from the
>>> destructor of a reference counted object is very problematic! This will
>>> be rejected no matter if you do that in C or in Rust.
>>>
>>> What we can do is to make it safe in the sense that you don't access
>>> freed up memory by using the scheduler fences even more as wrapper
>>> around the hardware fence as we do now. But this quite a change and
>>> requires a bit more than just hacking around
>>> drm_sched_fence_get_timeline_name().
>> I really think this needs to be documented if nothing else out of this thread.
>>
>> Clearly nobody is going to get it right and hidden here in this
>> thread, this info isn't useful.
>>
>> Can we have some sort of design document for the dma-fence/scheduler
>> interactions written and we can try and refine it with solutions on
>> the list, because I'm tired of people proposing things and NAK's
>> getting thrown around without anything to point people at.
>>
>> The next NAK I see on the list will mean I block all patches from the
>> sender until they write a documentation patch, because seriously this
>> stuff is too hard for someone to just keep it in their head and expect
>> everyone else to understand from reading the code.
> I very much like the idea that NAK replies are counted as "you've just
> volunteered yourself for some documentation patches so that next time
> around you can reply with a link to the docs instead of just a NAK".
Yeah, that sounds like a great idea to me as well :)
Especially when I can use it to convince managers that we need to have 
more work force on writing documentation.
> I don't think we'll get out of these discussions otherwise, since
> currently we have undocumented, but very tricky semantics of the
> drm/sched codebase for ringbuffer scheduling which is extended to fw
> scheduling in also very tricky ways, with not entirely clear impacts
> on semantics of all the drm/sched things. And as a result we just pile
> up enormous amounts of threads where I think the only thing assured is
> that people talk past each another.
The scheduler is certainly the ugliest part, but it's unfortunately 
still only the tip of the iceberg.
I have seen at least halve a dozen approach in the last two years where 
people tried to signal a dma_fence from userspace or similar.
Fortunately it was mostly prototyping and I could jump in early enough 
to stop that, but basically this is a fight against windmills.
I was considering to change the dma_fence semantics so that 
dma_fence_signal() could only be called from the interrupt contexts of 
devices and then put a big fat WARN_ON(!in_interrupt()) in there.
It's a sledgehammer, but as far as I can see the only thing which might 
help. Opinions?
Thanks,
Christian.
>
> Converting NAKs into doc patches should at least eventually get rid of
> the worst confusions we're dealing with here.
>
> Cheers, Sima
Powered by blists - more mailing lists
 
