linux-kernel - Re: [PATCH v2] drm/sched: Clarify scenarios for separate workqueues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aFF6xeu78cXTGFH0@phenom.ffwll.local>
Date: Tue, 17 Jun 2025 16:25:09 +0200
From: Simona Vetter <simona.vetter@...ll.ch>
To: Danilo Krummrich <dakr@...nel.org>
Cc: Philipp Stanner <phasta@...nel.org>,
	Matthew Brost <matthew.brost@...el.com>,
	Christian König <ckoenig.leichtzumerken@...il.com>,
	David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
	Sumit Semwal <sumit.semwal@...aro.org>,
	dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
	linux-media@...r.kernel.org
Subject: Re: [PATCH v2] drm/sched: Clarify scenarios for separate workqueues

On Tue, Jun 17, 2025 at 04:10:40PM +0200, Danilo Krummrich wrote:
> On Tue, Jun 17, 2025 at 03:51:33PM +0200, Simona Vetter wrote:
> > On Thu, Jun 12, 2025 at 04:49:54PM +0200, Philipp Stanner wrote:
> > > + * NOTE that sharing &struct drm_sched_init_args.submit_wq with the driver
> > > + * theoretically can deadlock. It must be guaranteed that submit_wq never has
> > > + * more than max_active - 1 active tasks, or if max_active tasks are reached at
> > > + * least one of them does not execute operations that may block on dma_fences
> > > + * that potentially make progress through this scheduler instance. Otherwise,
> > > + * it is possible that all max_active tasks end up waiting on a dma_fence (that
> > > + * can only make progress through this schduler instance), while the
> > > + * scheduler's queued work waits for at least one of the max_active tasks to
> > > + * finish. Thus, this can result in a deadlock.
> > 
> > Uh if you have an ordered wq you deadlock with just one misuse. I'd just
> > explain that the wq must provide sufficient forward-progress guarantees
> > for the scheduler, specifically that it's on the dma_fence signalling
> > critical path and leave the concrete examples for people to figure out
> > when the design a specific locking scheme.
> 
> This isn't a concrete example, is it? It's exactly what you say in slightly
> different words, with the addition of highlighting the impact of the workqueue's
> max_active configuration.
> 
> I think that's relevant, because N - 1 active tasks can be on the dma_fence
> signalling critical path without issues.
> 
> We could change
> 
> 	"if max_active tasks are reached at least one of them must not execute
> 	 operations that may block on dma_fences that potentially make progress
> 	 through this scheduler instance"
> 
> to 
> 
> 	"if max_active tasks are reached at least one of them must not be on the
> 	 dma_fence signalling critical path"
> 
> which is a bit more to the point I think.

My point was to more state that the wq must be suitable for the scheduler
jobs as the general issue, and specifically then also highlight the
dma_fence concurrency issue. But it's not the only one, you can have
driver locks and other fun involved here too.

Also since all the paragraphs above talk about ordered wq as the example
where specifying your own wq makes sense, it's a bit confusing to now
suddenly only talk about the concurrent wq case without again mentioned
that the ordered wq case is really limited.
-Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch