[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070704125219.GA98@tv-sign.ru>
Date: Wed, 4 Jul 2007 16:52:19 +0400
From: Oleg Nesterov <oleg@...sign.ru>
To: Johannes Berg <johannes@...solutions.net>
Cc: Ingo Molnar <mingo@...hat.com>,
Arjan van de Ven <arjan@...ux.intel.com>,
Linux Kernel list <linux-kernel@...r.kernel.org>,
linux-wireless <linux-wireless@...r.kernel.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>, mingo@...e.hu,
Thomas Sattler <tsattler@....de>
Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep
On 07/04, Johannes Berg wrote:
>
> On Tue, 2007-07-03 at 21:31 +0400, Oleg Nesterov wrote:
>
> > If A does NOT take a lock L1, then it is OK to do cancel_work_sync(A)
> > under L1, regardless of which other work_structs this workqueue has,
> > before or after A.
>
> Ah, cancel_work_sync() waits only for it if A is currently running?
Yes. And no other work (except a barrier) can run before the caller of
wait_on_work() is woken.
> > Now we have a false positive if some time we queue B into that workqueue,
> > and this is not good.
>
> Right. I was thinking of the flush_workqueue case where any of A or B
> matters.
Aha, now I see where I was confused. Yes, we can't avoid the false positives
with flush_workqueue().
I hope this won't be a problem, because almost every usage of flush_workqueue()
is pointless nowadays. So even if we have a false positive, it probably
means the code needs cleanups anyway.
But see below,
> > We can avoid this problem if we put lockdep_map into work_struct, so
> > that wait_on_work() "locks" work->lockdep_map, while flush_workqueue()
> > takes wq->lockdep_map.
>
> Yeah, and then we'll take both wq->lockdep_map and the
> work_struct->lockdep_map when running that work. That should work, I'll
> give it a go later.
If you are going to do this, may I suggest you to make 2 separate patches?
Exactly because we can't avoid the false positives with flush_workqueue(),
it would be nice if we have an option to revert the 2-nd patch if there are
too many false positives (I hope this won't happen).
(please ignore if this is not suitable for you).
> > > @@ -257,7 +260,9 @@ static void run_workqueue(struct cpu_wor
> > >
> > > BUG_ON(get_wq_data(work) != cwq);
> > > work_clear_pending(work);
> > > + lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> > > f(work);
> > > + lock_release(&cwq->wq->lockdep_map, 0, _THIS_IP_);
> > ^^^
> > Isn't it better to call lock_release() with nested == 1 ?
>
> Not sure, Ingo?
Ingo, could you also explain the meaning of "nested" parameter? Looks
like it is just unneeded, lock_release_nested() does a quick check
and use lock_release_non_nested() when hlock is not on top of stack.
Oleg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists