[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1183549772.3812.10.camel@johannes.berg>
Date: Wed, 04 Jul 2007 13:49:32 +0200
From: Johannes Berg <johannes@...solutions.net>
To: Oleg Nesterov <oleg@...sign.ru>
Cc: Ingo Molnar <mingo@...hat.com>,
Arjan van de Ven <arjan@...ux.intel.com>,
Linux Kernel list <linux-kernel@...r.kernel.org>,
linux-wireless <linux-wireless@...r.kernel.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>, mingo@...e.hu,
Thomas Sattler <tsattler@....de>
Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep
Oleg,
Thanks for your comments. Shows how little I really understand the
workqueue API :)
On Tue, 2007-07-03 at 21:31 +0400, Oleg Nesterov wrote:
> > I think this could lead to false positives, but then I think we
> > shouldn't care about those. Let me explain. The thing is that with this
> > it can happen that the code using the workqueue somehow obeys an
> > ordering in the work it queues, so as far as I can tell it'd be fine to
> > have two work items A and B where only B takes a lock L1, and then have
> > a wait_on_work(A) under L1, if and only if B was always queued right
> > after A (or something like that).
>
> Not sure I understand. Yes, we can have false positives, but I think the
> ordering in the workqueue doesn't matter.
>
> If A does NOT take a lock L1, then it is OK to do cancel_work_sync(A)
> under L1, regardless of which other work_structs this workqueue has,
> before or after A.
Ah, cancel_work_sync() waits only for it if A is currently running?
> Now we have a false positive if some time we queue B into that workqueue,
> and this is not good.
Right. I was thinking of the flush_workqueue case where any of A or B
matters.
> We can avoid this problem if we put lockdep_map into work_struct, so
> that wait_on_work() "locks" work->lockdep_map, while flush_workqueue()
> takes wq->lockdep_map.
Yeah, and then we'll take both wq->lockdep_map and the
work_struct->lockdep_map when running that work. That should work, I'll
give it a go later.
> But probably we don't need this right now, at least until we really
> have a lot of false positives while converting from flush_workqueue()
> to cancel_work_sync/cancel_delayed_work_sync.
I didn't really think about those yet, but I think you're right.
> > @@ -257,7 +260,9 @@ static void run_workqueue(struct cpu_wor
> >
> > BUG_ON(get_wq_data(work) != cwq);
> > work_clear_pending(work);
> > + lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> > f(work);
> > + lock_release(&cwq->wq->lockdep_map, 0, _THIS_IP_);
> ^^^
> Isn't it better to call lock_release() with nested == 1 ?
Not sure, Ingo?
> > +#define create_workqueue(name) \
> > +({ \
> > + static struct lock_class_key __key; \
> > + struct workqueue_struct *__wq; \
> > + \
> > + __wq = __create_workqueue((name), 0, 0, &__key); \
> > + __wq; \
> > +})
>
> Why do we need __wq ?
No particular reason I think, I copied some other code doing it that
way.
> +#define create_workqueue(name) \
> ({ \
> static struct lock_class_key __key; \
> __create_workqueue((name), 0, 0, &__key); \
> })
>
> Actually, I'd suggest to rename __create_workqueue() to __create_workqueue_key(),
> and then you need the only change in linux/workqueue.h
>
> - extern struct workqueue_struct *__create_workqueue(...);
> + extern struct workqueue_struct *__create_workqueue_key(..., key);
> + #define __create_workqueue(...) \
> + static struct lock_class_key __key; \
> + __create_workqueue_key(..., key); \
>
> but this is a matter of taste.
Sure, that works. I thought about compiling out the argument completely
for the no-lockdep case, would you prefer that?
> Btw, I think your patch found a real bug in net/mac80211/, cool!
Actually, I found the bug by experiencing it and analysing the stack
traces; then I thought that it should be possible to add lockdep support
for that to avoid regressing :)
johannes
Download attachment "signature.asc" of type "application/pgp-signature" (191 bytes)
Powered by blists - more mailing lists