lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 20 Feb 2007 09:23:41 +0100
From:	Jarek Poplawski <jarkao2@...pl>
To:	Ben Greear <greearb@...delatech.com>
Cc:	Stephen Hemminger <shemminger@...ux-foundation.org>,
	Francois Romieu <romieu@...zoreil.com>, netdev@...r.kernel.org,
	Kyle Lucke <klucke@...ibm.com>,
	Raghavendra Koushik <raghavendra.koushik@...erion.com>,
	Al Viro <viro@....linux.org.uk>, Ingo Molnar <mingo@...e.hu>
Subject: Re: [BUG] RTNL and flush_scheduled_work deadlocks

On Fri, Feb 16, 2007 at 08:06:25AM -0800, Ben Greear wrote:
...
> Well, I had lockdep and all of the locking debugging I could find 
> enabled, but
> it did not catch this problem..I had to use sysctl -t and manually dig 
> through the backtraces
> to find the deadlock....
> 
> It may be that lockdep could be enhanced to catch this sort of thing....

I think you are really good at traceing very interesting
(subtle) problems.

I guess the scenario is like this:

1) some process takes some lock (e.g. RTNL), 
2) kthread runs a work function, which tries to get the
   same lock,
3) the process with the lock calls flush_scheduled_work,
4) the flush_cpu_workqueue waits for kthread to finish.

So, the process #1 (with the lock) waits for the end 
of the process #2, which waits for the lock held by
process #1.

Of course it's a lockup - similar to circular dependency
but not the same: there is only one lock. I don't think
lockdep could be blamed here - if it's not a lock it
can't know the reason of process' #1 waiting.

In my opinion the solution should be looked for in the
workqueue code. My idea is: maybe there should be used
some additional lock taken by kthread before running
the workqueue and by a process calling the flush. Then
lockdep shouldn't have any problems with this dependency.
This lock could be #ifdef DEBUG_LOCK... so only where
it could be analyzed. Of course there may be some simpler
solution of this otherwise hard to track problem.

I CC this message to Ingo Molnar and hope he could find
some time to think about it.

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists