linux-kernel - Re: [RFC/PATCH] debug workqueue deadlocks with lockdep

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1183365478.4089.90.camel@johannes.berg>
Date:	Mon, 02 Jul 2007 10:37:58 +0200
From:	Johannes Berg <johannes@...solutions.net>
To:	Oleg Nesterov <oleg@...sign.ru>
Cc:	Ingo Molnar <mingo@...hat.com>,
	Arjan van de Ven <arjan@...ux.intel.com>,
	Linux Kernel list <linux-kernel@...r.kernel.org>,
	linux-wireless <linux-wireless@...r.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>, mingo@...e.hu,
	Thomas Sattler <tsattler@....de>
Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep

On Sat, 2007-06-30 at 15:46 +0400, Oleg Nesterov wrote:
> On 06/30, Ingo Molnar wrote:
> >
> > On Thu, 2007-06-28 at 19:33 +0200, Johannes Berg wrote:
> > > No, that's not right either, but Arjan just helped me a bit with how
> > > lockdep works and I think I have the right idea now. Ignore this for
> > > now, I'll send a new patch in a few days.
> > 
> > ok. But in general, this is a very nice idea!
> > 
> > i've Cc:-ed Oleg. Oleg, what do you think? I think we should keep all
> > the workqueue APIs specified in a form that makes them lockdep coverable
> > like Johannes did. This debug mechanism could have helped with the
> > recent DVB lockup that Thomas Sattler reported.
> 
> I think this idea is great!

:)

> Johannes, could you change wait_on_work() as well? Most users of
> flush_workqueue() should be converted to use it.

Sure. The case I had used flush_workqueue() but I'll look into
wait_on_work() and maybe converting the place where I had this.

> > @@ -342,6 +351,9 @@ static int flush_cpu_workqueue(struct cp
> >         } else {
> >                 struct wq_barrier barr;
> >
> > +               lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> > +               lock_release(&cwq->wq->lockdep_map, 0, _THIS_IP_);
> > +
> >                 active = 0;
> >                 spin_lock_irq(&cwq->lock);
> 
> I am not sure why you skip "if (cwq->thread == current)" case, it can
> deadlock in the same way.

I wasn't sure what would happen with recursion.

> But, perhaps we should not change flush_cpu_workqueue(). If we detect the
> deadlock, we will have num_online_cpus() reports, yes?

Not sure what you're thinking of here. I initially had it in
flush_workqueue() but then put it into flush_cpu_workqueue(), but I have
to admit that I already forgot why.

> And,
> 
> >                 if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
> > @@ -376,6 +388,8 @@ void fastcall flush_workqueue(struct wor
> >         int cpu;
> >
> >         might_sleep();
> > +       lock_acquire(&wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> > +       lock_release(&wq->lockdep_map, 0, _THIS_IP_);
> 
> one of the 2 callers was already modified. Perhaps it is better to add
> lock_acquire() into the second caller, cleanup_workqueue_thread(), but
> skip flush_cpu_workqueue() ?

I'll have to look at the code.

The problem I discussed with Arjan is that with this patch all
workqueues are in the same class which is wrong, so I'll have to modify
the workqueue creation API by introducing some macros for the LOCKDEP
case that pass the current code spot as the workqueue class. I'll try to
do that later today.

johannes

Download attachment "signature.asc" of type "application/pgp-signature" (191 bytes)