lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d3c0f721-2d19-4a1c-a086-33e8d6bd7be6@igalia.com>
Date: Tue, 22 Apr 2025 13:07:47 +0100
From: Tvrtko Ursulin <tvrtko.ursulin@...lia.com>
To: Danilo Krummrich <dakr@...nel.org>
Cc: phasta@...nel.org, Lyude Paul <lyude@...hat.com>,
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
 Matthew Brost <matthew.brost@...el.com>,
 Christian König <ckoenig.leichtzumerken@...il.com>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
 dri-devel@...ts.freedesktop.org, nouveau@...ts.freedesktop.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/5] drm/sched: Warn if pending list is not empty


On 22/04/2025 12:13, Danilo Krummrich wrote:
> On Tue, Apr 22, 2025 at 11:39:11AM +0100, Tvrtko Ursulin wrote:
>> Question I raised is if there are other drivers which manage to clean up
>> everything correctly (like the mock scheduler does), but trigger that
>> warning. Maybe there are not and maybe mock scheduler is the only false
>> positive.
> 
> So far the scheduler simply does not give any guideline on how to address the
> problem, hence every driver simply does something (or nothing, effectively
> ignoring the problem). This is what we want to fix.
> 
> The mock scheduler keeps it's own list of pending jobs and on tear down stops
> the scheduler's workqueues, traverses it's own list and eventually frees the
> pending jobs without updating the scheduler's internal pending list.
> 
> So yes, it does avoid memory leaks, but it also leaves the schedulers internal
> structures with an invalid state, i.e. the pending list of the scheduler has
> pointers to already freed memory.
> 
> What if the drm_sched_fini() starts touching the pending list? Then you'd end up
> with UAF bugs with this implementation. We cannot invalidate the schedulers
> internal structures and yet call scheduler functions - e.g. drm_sched_fini() -
> subsequently.
> 
> Hence, the current implementation of the mock scheduler is fundamentally flawed.
> And so would be *every* driver that still has entries within the scheduler's
> pending list.
> 
> This is not a false positive, it already caught a real bug -- in the mock
> scheduler.

To avoid furher splitting hairs on whether real bugs need to be able to 
manifest or not, lets move past this with a conclusion that there are 
two potential things to do here:

First one is to either send separately or include in this series 
something like:

diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c 
b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
index f999c8859cf7..7c4df0e890ac 100644
--- a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
+++ b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
@@ -300,6 +300,8 @@ void drm_mock_sched_fini(struct drm_mock_scheduler 
*sched)
                 drm_mock_sched_job_complete(job);
         spin_unlock_irqrestore(&sched->lock, flags);

+       drm_sched_fini(&sched->base);
+
         /*
          * Free completed jobs and jobs not yet processed by the DRM 
scheduler
          * free worker.
@@ -311,8 +313,6 @@ void drm_mock_sched_fini(struct drm_mock_scheduler 
*sched)

         list_for_each_entry_safe(job, next, &list, link)
                 mock_sched_free_job(&job->base);
-
-       drm_sched_fini(&sched->base);
  }

  /**

That should satisfy the requirement to "clear" memory about to be freed 
and be 100% compliant with drm_sched_fini() kerneldoc (guideline b).

But the new warning from 3/5 here will still be there AFAICT and would 
you then agree it is a false positive?

Secondly, the series should modify all drivers (including the unit 
tests) which are known to trigger this false positive.

Regards,

Tvrtko


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ