lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 15 Jan 2013 09:53:59 +0800
From:	Ming Lei <ming.lei@...onical.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Alex Riesen <raa.lkml@...il.com>,
	Alan Stern <stern@...land.harvard.edu>,
	Jens Axboe <axboe@...nel.dk>,
	USB list <linux-usb@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: USB device cannot be reconnected and khubd "blocked for more than
 120 seconds"

On Tue, Jan 15, 2013 at 1:30 AM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
> On Sun, Jan 13, 2013 at 11:15 PM, Ming Lei <ming.lei@...onical.com> wrote:
>>
>> The deadlock problem is caused by calling request_module() inside
>> async function of do_scan_async(), and it was introduced by Linus's
>> below commit:
>>
>> commit d6de2c80e9d758d2e36c21699117db6178c0f517
>> Author: Linus Torvalds <torvalds@...ux-foundation.org>
>> Date:   Fri Apr 10 12:17:41 2009 -0700
>>
>>     async: Fix module loading async-work regression
>>
>> IMO, maybe the commit isn't a proper fix, considered the
>> below fact:
>>
>> - it isn't good to allow async function to be marked as __init
>
> Immaterial. For modules, __init is a non-issue. For non-modules, the
> synchronization elsewhere is sufficient.

Looks 5d38258ec026921a7b266f4047ebeaa75db358e5(ACPI battery:
fix async boot oops) addresses the issue of __init for modules.

>
>> - any user mode shouldn't expect that the device is ready just
>> after completing of 'insmod'
>
> Bullshit. That expectation is just a fact. People insmod a device
> driver, and mount the device immediately in scripts.

I mean we can let the device node populated in probe() first,
but let open() wait for completion of the async probe(). Maybe my
expression is not accurate, here the 'device isn't ready' just means
that the async probe() isn't completed, and doesn't mean the device
node doesn't come.

>
> We do not say "user mode shouldn't". Seriously. EVER. User mode
> *does*, and we deal with it. Learn it now, and stop ever saying that
> again.
>
> This is really starting to annoy me. Kernel developers who say "user
> mode should be fixes to not do that" should go somewhere else. The
> whole and *only* point of a kernel is to hide these kinds of issues
> from user mode, and make things "just work" in user mode. User mode
> should not ever worry about "oh, doing X can trigger a module load, so
> now the device might not be available immediately, so I should delay
> and re-try until it is".
>
> That's just f*cking crazy talk.
>
> We have a very simple rule in the kernel: we don't break user space. EVER.

No, I don't mean we should break user space, see above.

>
> Learn that rule. I don't ever want to hear "any user mode shouldn't
> expect" again. User mode *does* expect. End of discussion.
>
>> - from view of driver, introducing async_synchronize_full() after
>> do_one_initcall() inside do_init_module() is like a sync probe
>> for drivers built as module, and cause this kind of deadlock easily.
>>
>> So could we revert the commit and fix the previous problems just
>> case by case? or other better fix?
>
> There's no way in hell we take a "fix things one by one" approach.
> It's not going to work. And your suggestion seems to not do async
> discovery of devices in general, which is a *much* worse fix than
> anything else. It's just crazy.

I will try to figure out one patch to address the scsi block async probe
issue first, and see if it can fix the problem by moving add_disk()
into sd_probe()
and calling async_synchronize_full_domain(&scsi_sd_probe_domain)
in the entry of sd_open().

>
> But there are other approaches we might take. We might move the call to
>
>     async_synchronize_full();
>
> to other places. For example, maybe we're better off doing it at
> block/char device open instead?

Looks it is similar with the above idea, but we have to remove the
async_synchronize_full() in do_init_module() together.

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ