netdev - Re: 2.6.25rc7 lockdep trace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1206784948.22530.128.camel@johannes.berg>
Date:	Sat, 29 Mar 2008 11:02:28 +0100
From:	Johannes Berg <johannes@...solutions.net>
To:	David Miller <davem@...emloft.net>
Cc:	davej@...emonkey.org.uk, netdev@...r.kernel.org
Subject: Re: 2.6.25rc7 lockdep trace

[merging your two mails]

> The issue is that linkwatch goes: workqueue lock --> RTNL
> 
> And thus taking workqueue lock within RTNL is deadlock
> prone no matter where you do it.

Which lock are you referring to here? There is no real runqueue lock
other than a spin-lock for the list. I think you may be misunderstanding
the lockdep output in a way I didn't anticipated when adding lockdep
debugging for workqueue stuff.

The actual work function is _not_ running with any locks held! The fact
that you see lockdep output there is because we tell lockdep we're
holding two locks, namely the "work" and the "workqueue". These are
fake, they aren't actually locks.

More importantly, those "locks" are locked and unlocked for every _run_
of the struct work function, not around the whole runqueue manipulation,
that part is done atomically without global locks.

So what you have is essentially
list_for_each_work
	lock(workqueue)
	lock(struct work)
	work->f();
	unlock(struct work)
	unlock(workqueue)

where both those locks are really only telling lockdep there are locks
to get it to warn about the deadlock scenario. Unfortunately Andrew
mangled the two patches into one when committing, so I can't point to
the changelog, but here are two links to the two patches adding this:
http://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc1/2.6.23-rc1-mm2/broken-out/workqueue-debug-flushing-deadlocks-with-lockdep.patch
http://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc1/2.6.23-rc1-mm2/broken-out/workqueue-debug-work-related-deadlocks-with-lockdep.patch

I think the "runqueue lock" you're referring to is the fake
"lock(workqueue)" which isn't around the whole runqueue manipulation but
only a lockdep helper to debug exactly the deadlock problem we're
talking about. I'm open to renaming it, but I don't think lockdep
supports formatting its output differently.

> I don't see how you can not race with the transition from
> scheduled to "executing" without taking the runqueue lock
> for the testing.

When you call cancel_work_sync(), the work struct will be grabbed by the
code (really __cancel_work_timer) and removed from the queue. That just
operates on bits and a spinlock, not locks held across the struct work
function execution, and ensures it is race-free without needing any such
locks.

> And it is crucial that the workqueue function doesn't
> execute "accidently" due to such a race before the module
> and thus the workqueue code is about to get potentially
> unloaded.

That is exactly what cancel_work_sync() guarantees, but not more. On
contrary, flush_workqueue() also guarantees that the work is run one
last time if it was scheduled at the time of the flush_workqueue() call.

However, as I just tried to explain, cancel_work_sync() _is_ safe to run
while holding the RTNL because it doesn't need any runqueue lock.

johannes

Download attachment "signature.asc" of type "application/pgp-signature" (829 bytes)