linux-kernel - Re: [PATCH] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cfbaceae-6d40-a8b3-e449-6473be234d2d@asahilina.net>
Date:   Thu, 6 Apr 2023 17:49:17 +0900
From:   Asahi Lina <lina@...hilina.net>
To:     Christian König <christian.koenig@....com>,
        Luben Tuikov <luben.tuikov@....com>,
        David Airlie <airlied@...il.com>,
        Daniel Vetter <daniel@...ll.ch>,
        Sumit Semwal <sumit.semwal@...aro.org>
Cc:     dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
        linux-media@...r.kernel.org, asahi@...ts.linux.dev
Subject: Re: [PATCH] drm/scheduler: Fix UAF in
 drm_sched_fence_get_timeline_name

On 06/04/2023 17.29, Christian König wrote:
> Am 05.04.23 um 18:34 schrieb Asahi Lina:
>> A signaled scheduler fence can outlive its scheduler, since fences are
>> independently reference counted.
> 
> Well that is actually not correct. Schedulers are supposed to stay
> around until the hw they have been driving is no longer present.

But the fences can outlive that. You can GPU render into an imported 
buffer, which attaches a fence to it. Then the GPU goes away but the 
fence is still attached to the buffer. Then you oops when you cat that 
debugfs file...

My use case does this way more often (since schedulers are tied to UAPI 
objects), which is how I found this, but as far as I can tell this is 
already broken for all drivers on unplug/unbind/anything else that would 
destroy the schedulers with fences potentially referenced on separate 
scanout devices or at any other DMA-BUF consumer.

> E.g. the reference was scheduler_fence->hw_fence->driver->scheduler.

It's up to drivers not to mess that up, since the HW fence has the same 
requirements that it can outlive other driver objects, just like any 
other fence. That's not something the scheduler has to be concerned 
with, it's a driver correctness issue.

Of course, in C you have to get it right yourself, while with correct 
Rust abstractions will cause your code to fail to compile if you do it 
wrong ^^

In my particular case, the hw_fence is a very dumb object that has no 
references to anything, only an ID and a pending op count. Jobs hold 
references to it and decrement it until it signals, not the other way 
around. So that object can live forever regardless of whether the rest 
of the device is gone.

> Your use case is now completely different to that and this won't work
> any more.
> 
> This here might just be the first case where that breaks.

This bug already exists, it's just a lot rarer for existing use cases... 
but either way Xe is doing the same thing I am, so I'm not the only one 
here either.

~~ Lina