netdev - Re: 2.6.25rc7 lockdep trace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080402085126.GA96@tv-sign.ru>
Date:	Wed, 2 Apr 2008 12:51:26 +0400
From:	Oleg Nesterov <oleg@...sign.ru>
To:	David Miller <davem@...emloft.net>
Cc:	johannes@...solutions.net, davej@...emonkey.org.uk,
	netdev@...r.kernel.org
Subject: Re: 2.6.25rc7 lockdep trace

On 03/28, David Miller wrote:
>
> From: Johannes Berg <johannes@...solutions.net>
> Date: Sat, 29 Mar 2008 02:01:25 +0100
> 
> > 
> > > > You can't flush a workqueue in the device close handler
> > > > exactly because of this locking conflict.
> > > > 
> > > > Nobody has come up with a suitable way to fix this yet.
> > > 
> > > Maybe we should check which schedule_work users actually lock the rtnl
> > > within the work function and move them to a uses-rtnl-in-work workqueue
> > > so that everybody else can have rtnl around flush.
> > 
> > On the other hand, most drivers don't actually care that their work has
> > run, they just care that it won't run in the future after they give up
> > resources or similar, hence they can and should use cancel_work_sync()
> > which doesn't suffer from the deadlock. But that needs actual inspection
> > because it does change behaviour from "run and wait for it if scheduled"
> > to "cancel if scheduled".
> 
> I don't see how you can not race with the transition from
> scheduled to "executing" without taking the runqueue lock
> for the testing.

Yes, cancel_work_sync() takes cwq->lock but this is fine (unless it is buggy ;)
Please note that run_workqueue() drops this lock before calling work->func().

If the caller of cancel_work_sync(work) doesn't share locks with work->func()
we can't deadlock, even if there are other pending/running work_structs which
need the same locks as the caller (say, RTNL).

But, perhaps, you mean wq->lockdep_map? As Johannes pointed out this lock is
fake, but I think this doesn't matter, from the correctness POV it is "real"
lock. What does matter is that cancel_work_sync() doesn't use this lock at all.

(again, Johannes has already explained this all).

> And it is crucial that the workqueue function doesn't
> execute "accidently" due to such a race before the module
> and thus the workqueue code is about to get potentially
> unloaded.

Which race? Unless explicitly queued afterwards, work->func() can't execute
after return from cancel_work_sync(work).

David, I think you misunderstood Johannes, or perhaps I missed something.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html