[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <649c7fe3469c39496db89de6951d9f2b61c36576.camel@mailbox.org>
Date: Tue, 22 Apr 2025 14:00:11 +0200
From: Philipp Stanner <phasta@...lbox.org>
To: Danilo Krummrich <dakr@...nel.org>, Tvrtko Ursulin
<tvrtko.ursulin@...lia.com>
Cc: phasta@...nel.org, Lyude Paul <lyude@...hat.com>, David Airlie
<airlied@...il.com>, Simona Vetter <simona@...ll.ch>, Matthew Brost
<matthew.brost@...el.com>, Christian König
<ckoenig.leichtzumerken@...il.com>, Maarten Lankhorst
<maarten.lankhorst@...ux.intel.com>, Maxime Ripard <mripard@...nel.org>,
Thomas Zimmermann <tzimmermann@...e.de>, dri-devel@...ts.freedesktop.org,
nouveau@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/5] drm/sched: Warn if pending list is not empty
On Tue, 2025-04-22 at 13:13 +0200, Danilo Krummrich wrote:
> On Tue, Apr 22, 2025 at 11:39:11AM +0100, Tvrtko Ursulin wrote:
> > Question I raised is if there are other drivers which manage to
> > clean up
> > everything correctly (like the mock scheduler does), but trigger
> > that
> > warning. Maybe there are not and maybe mock scheduler is the only
> > false
> > positive.
For clarification:
I messed up the comment from the cover letter.
What I did was run the *old* unit tests (v5 IIRC) from Tvrtko that
still had the memory leaks. Those then trigger the warning print, as is
expected, since they don't provide fence_context_kill().
The current unit tests are fine memory-leak-wise.
IOW, both with Nouveau and the unit tests, everything behaves as
expected, without issues.
Sorry for the confusion.
P.
>
> So far the scheduler simply does not give any guideline on how to
> address the
> problem, hence every driver simply does something (or nothing,
> effectively
> ignoring the problem). This is what we want to fix.
>
> The mock scheduler keeps it's own list of pending jobs and on tear
> down stops
> the scheduler's workqueues, traverses it's own list and eventually
> frees the
> pending jobs without updating the scheduler's internal pending list.
>
> So yes, it does avoid memory leaks, but it also leaves the schedulers
> internal
> structures with an invalid state, i.e. the pending list of the
> scheduler has
> pointers to already freed memory.
>
> What if the drm_sched_fini() starts touching the pending list? Then
> you'd end up
> with UAF bugs with this implementation. We cannot invalidate the
> schedulers
> internal structures and yet call scheduler functions - e.g.
> drm_sched_fini() -
> subsequently.
>
> Hence, the current implementation of the mock scheduler is
> fundamentally flawed.
> And so would be *every* driver that still has entries within the
> scheduler's
> pending list.
>
> This is not a false positive, it already caught a real bug -- in the
> mock
> scheduler.
>
> - Danilo
Powered by blists - more mailing lists