lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aEB9/VGHJGnY4+fP@lstrano-desk.jf.intel.com>
Date: Wed, 4 Jun 2025 10:10:21 -0700
From: Matthew Brost <matthew.brost@...el.com>
To: Danilo Krummrich <dakr@...nel.org>
CC: Simona Vetter <simona.vetter@...ll.ch>, Christian König
	<christian.koenig@....com>, Philipp Stanner <phasta@...nel.org>, David Airlie
	<airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
	<dri-devel@...ts.freedesktop.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] drm/sched: Discourage usage of separate workqueues

On Wed, Jun 04, 2025 at 06:53:44PM +0200, Danilo Krummrich wrote:
> On Wed, Jun 04, 2025 at 09:45:00AM -0700, Matthew Brost wrote:
> > On Wed, Jun 04, 2025 at 05:07:15PM +0200, Simona Vetter wrote:
> > > We should definitely document this trick better though, I didn't find any
> > > place where that was documented.
> > 
> > This is a good idea.
> 
> I think - and I also mentioned this a few times in the patch series that added
> the workqueue support - we should also really document the pitfalls of this.
> 
> If the scheduler shares a workqueue with the driver, the driver needs to take
> special care when submitting work that it's not possible to prevent run_job and
> free_job work from running by doing this.
> 
> For instance, if it's a single threaded workqueue and the driver submits work
> that allocates with GFP_KERNEL, this is a deadlock condition.
> 
> More generally, if the driver submits N work that, for instance allocates with
> GFP_KERNEL, it's also a deadlock condition if N == max_active.

Can we prime lockdep on scheduler init? e.g.

fs_reclaim_acquire(GFP_KERNEL);
workqueue_lockdep_acquire();
workqueue_lockdep_release();
fs_reclaim_release(GFP_KERNEL);

In addition to documentation, this would prevent workqueues from being
used that allocate with GFP_KERNEL.

Maybe we could use dma_fence_sigaling annotations instead of
fs_reclaim_acquire, but at one point those gave Xe false lockdep
positives so use fs_reclaim_acquire in similar cases. Maybe that has
been fixed though.

Matt

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ