lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131203151953.GK8277@htj.dyndns.org>
Date:	Tue, 3 Dec 2013 10:19:53 -0500
From:	Tejun Heo <tj@...nel.org>
To:	Josh Hunt <joshhunt00@...il.com>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jonathan Nieder <jrnieder@...il.com>,
	Ming Lei <ming.lei@...onical.com>,
	Alex Riesen <raa.lkml@...il.com>,
	Alan Stern <stern@...land.harvard.edu>,
	Jens Axboe <axboe@...nel.dk>,
	USB list <linux-usb@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Arjan van de Ven <arjan@...ux.intel.com>,
	Rusty Russell <rusty@...tcorp.com.au>
Subject: Re: [3.8-rc3 -> 3.8-rc4 regression] Re: [PATCH] module, async:
 async_synchronize_full() on module init iff async is used

Hello,

On Tue, Dec 03, 2013 at 08:28:43AM -0600, Josh Hunt wrote:
> You're right. Thanks for pointing this out. I did not realize there
> was a bug in the init script. The version of initramfs-tools I was
> using had the following bug:
> https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/1215911
> 
> Updating to 0.99ubuntu13.4 of initramfs-tools resolved my boot hangs.
> 
> I did try using the workaround as suggested by Linus. In my setup the
> dm_init() code was hit, however it still appeared to be too late at
> times. I also tried moving the call to async_synchronize_full() above
> the for loop and it still had the same issue (patch attached.) Out of
> around 10 reboot tests it failed to find root 1 or 2 times.
> 
> The ubuntu scripts don't ever actually call do_mount() if it can't
> find the device. It seems to rely on some udev functionality to tell
> it when the device is present, and if that fails it just bails out.
> 
> This change has introduced a regression. However, I only noticed it
> b/c my init script had a bug which caused it not to wait around for
> the device to appear.

Hmmm.... so, read the bug report, digged and asked around a bit.
Here's the root problem - ubuntu's initramfs uses a tool to wait for
the root device which uses libudev to listen for the device event;
unfortunately, its rx buffer is not set large enough and the receiver
isn't fast enough, which means that netlink broadcast messages from
the kernel can overrun the buffer.  When that happens, it sets an
error on the socket, so the next recv fails with -ENOBUFS.  If that
happens, the wait for root aborts immediately and initramfs proceeds
to mount non-existent root device.

The only thing which changes by these patches is the timing of events.
The problem likely wasn't as exposed before because things were slow
enough so that either the messages could be consumed fast enough or
there's enough delay between libata module load and the root device
wait hiding the bug in the wait logic.

So, yeah, it's a full blown timing bug.  I'm not sure what we can do
to work around from kernel side except for randomly slowing things
down or forcefully enlarging rx buffer size.  There really is no
interlocking to take advantage of. :(

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ