lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6b6267872fcc5e75883144f241c79c93c03fcead.camel@mailbox.org>
Date: Fri, 25 Apr 2025 11:57:18 +0200
From: Philipp Stanner <phasta@...lbox.org>
To: Alice Ryhl <aliceryhl@...gle.com>, Tejun Heo <tj@...nel.org>
Cc: Lai Jiangshan <jiangshanlai@...il.com>, Danilo Krummrich
 <dakr@...nel.org>,  linux-kernel@...r.kernel.org
Subject: Re: [PATCH] workqueue: flush all pending jobs in destroy_workqueue()

On Fri, 2025-04-25 at 09:33 +0000, Alice Ryhl wrote:
> On Thu, Apr 24, 2025 at 09:57:55AM -1000, Tejun Heo wrote:
> > Hello, Alice.
> > 
> > On Wed, Apr 23, 2025 at 05:51:27PM +0000, Alice Ryhl wrote:
> > ...
> > > @@ -367,6 +367,8 @@ struct workqueue_struct {
> > >  	struct lockdep_map	__lockdep_map;
> > >  	struct lockdep_map	*lockdep_map;
> > >  #endif
> > > +	raw_spinlock_t		delayed_lock;	/* protects
> > > pending_list */
> > > +	struct list_head	delayed_list;	/* list of
> > > pending delayed jobs */
> > 
> > I think we'll have to make this per-CPU or per-pwq. There can be a
> > lot of
> > delayed work items being queued on, e.g., system_wq. Imagine that
> > happening
> > on a multi-socket NUMA system. That cacheline is going to be
> > bounced around
> > pretty hard.
> 
> Hmm. I think we would need to add a new field to delayed_work to keep
> track of which list it has been added to.
> 
> Another option could be to add a boolean that disables the list.
> After
> all, we never call destroy_workqueue() on system_wq so we don't need
> the
> list for that workqueue.
> 
> Thoughts?

I for my part was astonished that I actually found this half-bug in the
WQ implementation, because WQs are a) very important and b) very
intensively used, so I had expected that the bug *must* be on my side.
The fact that it wasn't is a hint for me that there are not that many
parties in the kernel that tear down with non-canceled DW.

You also have to race a bit to run into the problem.

I'm not sure how relevant that is for the synchronization overhead
Tejun describes; but take it for what it's worth.


P.

> 
> Alice


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ