lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 4 Dec 2013 17:01:53 -0600
From:	Josh Hunt <joshhunt00@...il.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jonathan Nieder <jrnieder@...il.com>,
	Ming Lei <ming.lei@...onical.com>,
	Alex Riesen <raa.lkml@...il.com>,
	Alan Stern <stern@...land.harvard.edu>,
	Jens Axboe <axboe@...nel.dk>,
	USB list <linux-usb@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Arjan van de Ven <arjan@...ux.intel.com>,
	Rusty Russell <rusty@...tcorp.com.au>
Subject: Re: [3.8-rc3 -> 3.8-rc4 regression] Re: [PATCH] module, async:
 async_synchronize_full() on module init iff async is used

On Tue, Dec 3, 2013 at 9:19 AM, Tejun Heo <tj@...nel.org> wrote:
> Hello,
>
> On Tue, Dec 03, 2013 at 08:28:43AM -0600, Josh Hunt wrote:
>> You're right. Thanks for pointing this out. I did not realize there
>> was a bug in the init script. The version of initramfs-tools I was
>> using had the following bug:
>> https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/1215911
>>
>> Updating to 0.99ubuntu13.4 of initramfs-tools resolved my boot hangs.
>>
>> I did try using the workaround as suggested by Linus. In my setup the
>> dm_init() code was hit, however it still appeared to be too late at
>> times. I also tried moving the call to async_synchronize_full() above
>> the for loop and it still had the same issue (patch attached.) Out of
>> around 10 reboot tests it failed to find root 1 or 2 times.
>>
>> The ubuntu scripts don't ever actually call do_mount() if it can't
>> find the device. It seems to rely on some udev functionality to tell
>> it when the device is present, and if that fails it just bails out.
>>
>> This change has introduced a regression. However, I only noticed it
>> b/c my init script had a bug which caused it not to wait around for
>> the device to appear.
>
> Hmmm.... so, read the bug report, digged and asked around a bit.
> Here's the root problem - ubuntu's initramfs uses a tool to wait for
> the root device which uses libudev to listen for the device event;
> unfortunately, its rx buffer is not set large enough and the receiver
> isn't fast enough, which means that netlink broadcast messages from
> the kernel can overrun the buffer.  When that happens, it sets an
> error on the socket, so the next recv fails with -ENOBUFS.  If that
> happens, the wait for root aborts immediately and initramfs proceeds
> to mount non-existent root device.
>
> The only thing which changes by these patches is the timing of events.
> The problem likely wasn't as exposed before because things were slow
> enough so that either the messages could be consumed fast enough or
> there's enough delay between libata module load and the root device
> wait hiding the bug in the wait logic.
>
> So, yeah, it's a full blown timing bug.  I'm not sure what we can do
> to work around from kernel side except for randomly slowing things
> down or forcefully enlarging rx buffer size.  There really is no
> interlocking to take advantage of. :(

So there used to be a call to async_synchronize_full() in
ata_host_register(), but it was removed by
f29d3b23238e1955a8094e038c72546e99308e61 as part of some fastboot
changes. Adding it back (in the attached patch) seems to resolve the
issue when using the broken initrd. I'm guessing adding it back isn't
an option, but I wanted to point it out.

-- 
Josh

View attachment "dbg-ata.patch" of type "text/x-patch" (520 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ