linux-kernel - Re: USB device cannot be reconnected and khubd "blocked for more than 120 seconds"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFz1T8t57zTiEKYJKS9crNS99gpp+SxQ41A9TKj2hoq72w@mail.gmail.com>
Date:	Mon, 14 Jan 2013 10:34:57 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Alan Stern <stern@...land.harvard.edu>
Cc:	Ming Lei <ming.lei@...onical.com>,
	Alex Riesen <raa.lkml@...il.com>, Jens Axboe <axboe@...nel.dk>,
	USB list <linux-usb@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: USB device cannot be reconnected and khubd "blocked for more than
 120 seconds"

On Mon, Jan 14, 2013 at 10:04 AM, Alan Stern <stern@...land.harvard.edu> wrote:
>
> How about skipping that call if the current thread is one of the async
> helpers?  Is it possible to detect when that happens?
>
> Or maybe such a check should go inside async_synchronize_full() itself.

Do we have some idea of exactly what is waiting for what? Which async
context is causing the module load to happen in the first place?

I think *that* is what we should avoid - it sounds like the block
layer is loading the IO scheduler at the wrong point. I realize that
people like (for testing purposes) to change the IO scheduler at
random, but if that means that any IO can basically result in a
request_module(), then that sounds like a problem.

It seems to be "elevator_get()", and I presume the chain is something
like "load block driver async, the block driver does
blk_init_allocated_queue, that does request_module() to find the
elevator, the request_module() succeeds, but ends up waiting for async
work, which is the block driver load, which is waiting for the
request_module to finish".

And my gut feel is that blk_init_allocated_queue() probably shouldn't
do that request_module() at all. We migth want to do it when we *open*
the device, but not while loading the module for the device.

So my _feeling_ is that this is just a bug in the block layer, and
that it shouldn't set up block device drivers for this kind of crazy
"need to load the elevator module while in the middle of scanning
devices". I think *that* is what we should aim to change.

Hmm?

That said, I think it might indeed be a good idea to make this problem
much easier to see, and that "detect when it happens" would be a good
thing (and then we should WARN_ON_ONCE() on people trying to do
request_module() calls from async context).

               Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/