linux-kernel - Re: USB device cannot be reconnected and khubd "blocked for more than 120 seconds"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130115235043.GJ2668@htj.dyndns.org>
Date:	Tue, 15 Jan 2013 15:50:43 -0800
From:	Tejun Heo <tj@...nel.org>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Ming Lei <ming.lei@...onical.com>,
	Alex Riesen <raa.lkml@...il.com>,
	Alan Stern <stern@...land.harvard.edu>,
	Jens Axboe <axboe@...nel.dk>,
	USB list <linux-usb@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Arjan van de Ven <arjan@...ux.intel.com>
Subject: Re: USB device cannot be reconnected and khubd "blocked for more
 than 120 seconds"

cc'ing Arjan.  Arjan, the original thread can be read from

  http://thread.gmane.org/gmane.linux.kernel/1420814

Hello, again.

On Tue, Jan 15, 2013 at 12:18:01PM -0800, Linus Torvalds wrote:
> I think that is a good solution if it works, but look out: we need to
> synchronize across *all* domains, not just the default one.  The sd.c
> code, for example, uses its own "scsi_sd_probe_domain" for example,
> and we *do* want to synchronize with it.
> 
> Can you do that with your suggested interface (ie it would have to be
> a *global* sequence number).

So, I've been thinking about it for a while now and it looks like
async is cutting too many corners to implement any sane stackable
flushing scheme on top.  There simply isn't much information to
determine who should wait for what.

I've thought of two workarounds.  Both suck.

A. Try to detect deadlock conditions from synchronize().  If deadlock
   condition involving other async jobs are detected, whine about it
   and then skip.  Ignore deadlock condition on self (should solve
   this particular case).

   Detecting deadlock condition isn't difficult if there are only
   global synchronizations; unfortunately, fragmented dependencies via
   domain-local synchronization makes this non-trivial.

   We can still do ignore-self thing mostly trivially tho.  This will
   at least work around the problem at hand.

B. The ranged synchronization I first suggested.  The problem with
   this is that it's a common practice for a given async job to try to
   flush anything which comes before it.  This can introduce spurious
   synchronization dependencies which can then lead to deadlocks.

   These conditions can be detected and ignored, at least only
   considering global synchronizations.  The problem here is that
   those deadlock conditions will occur under normal usage and thus
   should be ignored silently, which basically makes synchronization
   silently ignore and finish successfully even if there are
   legitimate deadlocks which should be investigated.

For now, I'm gonna implement simple "I'm not gonna wait for myself"
self-deadlock avoidance.  If this needs any more sophistication, I
think we better reimplement it so that we can explicitly match up and
track who's gonna wait for what instead of throwing everything into a
single cookie space and then try to work back from there.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/