lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 02 Jul 2007 10:37:58 +0200
From:	Johannes Berg <johannes@...solutions.net>
To:	Oleg Nesterov <oleg@...sign.ru>
Cc:	Ingo Molnar <mingo@...hat.com>,
	Arjan van de Ven <arjan@...ux.intel.com>,
	Linux Kernel list <linux-kernel@...r.kernel.org>,
	linux-wireless <linux-wireless@...r.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>, mingo@...e.hu,
	Thomas Sattler <tsattler@....de>
Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep

On Sat, 2007-06-30 at 15:46 +0400, Oleg Nesterov wrote:
> On 06/30, Ingo Molnar wrote:
> >
> > On Thu, 2007-06-28 at 19:33 +0200, Johannes Berg wrote:
> > > No, that's not right either, but Arjan just helped me a bit with how
> > > lockdep works and I think I have the right idea now. Ignore this for
> > > now, I'll send a new patch in a few days.
> > 
> > ok. But in general, this is a very nice idea!
> > 
> > i've Cc:-ed Oleg. Oleg, what do you think? I think we should keep all
> > the workqueue APIs specified in a form that makes them lockdep coverable
> > like Johannes did. This debug mechanism could have helped with the
> > recent DVB lockup that Thomas Sattler reported.
> 
> I think this idea is great!

:)

> Johannes, could you change wait_on_work() as well? Most users of
> flush_workqueue() should be converted to use it.

Sure. The case I had used flush_workqueue() but I'll look into
wait_on_work() and maybe converting the place where I had this.

> > @@ -342,6 +351,9 @@ static int flush_cpu_workqueue(struct cp
> >         } else {
> >                 struct wq_barrier barr;
> >
> > +               lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> > +               lock_release(&cwq->wq->lockdep_map, 0, _THIS_IP_);
> > +
> >                 active = 0;
> >                 spin_lock_irq(&cwq->lock);
> 
> I am not sure why you skip "if (cwq->thread == current)" case, it can
> deadlock in the same way.

I wasn't sure what would happen with recursion.

> But, perhaps we should not change flush_cpu_workqueue(). If we detect the
> deadlock, we will have num_online_cpus() reports, yes?

Not sure what you're thinking of here. I initially had it in
flush_workqueue() but then put it into flush_cpu_workqueue(), but I have
to admit that I already forgot why.

> And,
> 
> >                 if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
> > @@ -376,6 +388,8 @@ void fastcall flush_workqueue(struct wor
> >         int cpu;
> >
> >         might_sleep();
> > +       lock_acquire(&wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> > +       lock_release(&wq->lockdep_map, 0, _THIS_IP_);
> 
> one of the 2 callers was already modified. Perhaps it is better to add
> lock_acquire() into the second caller, cleanup_workqueue_thread(), but
> skip flush_cpu_workqueue() ?

I'll have to look at the code.

The problem I discussed with Arjan is that with this patch all
workqueues are in the same class which is wrong, so I'll have to modify
the workqueue creation API by introducing some macros for the LOCKDEP
case that pass the current code spot as the workqueue class. I'll try to
do that later today.

johannes

Download attachment "signature.asc" of type "application/pgp-signature" (191 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ