lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081114061847.GB2227@x200.localdomain>
Date:	Fri, 14 Nov 2008 09:18:47 +0300
From:	Alexey Dobriyan <adobriyan@...il.com>
To:	"Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>
Cc:	Jens Axboe <jens.axboe@...cle.com>, tj@...nel.org,
	LKML <linux-kernel@...r.kernel.org>, albcamus@...il.com,
	pjones@...hat.com, alex.shi@...el.com
Subject: Re: system fails to boot

On Fri, Nov 14, 2008 at 01:16:21PM +0800, Zhang, Yanmin wrote:
> Jens,
> 
> We run into system boot failure with kernel 2.6.28-rc. We found it on a couple of
> machines, including T61 notebook, nehalem machine, and another HPC NX6325 notebook.
> All the machines use FedoraCore 8 or FedoraCore 9. With kernel prior to 2.6.28-rc,
> system boot doesn't fail.
> 
> I debug it and locate the root cause. Pls. see
> http://bugzilla.kernel.org/show_bug.cgi?id=11899
> https://bugzilla.redhat.com/show_bug.cgi?id=471517
> 
> As a matter of fact, there are 2 bugs.
> 
> 1)root=/dev/sda1, system boot randomly fails. Mostly, boot for 5
> times and fails once. nash has a bug. Some of its functions misuse return value 0.
> Sometimes, 0 means timeout and no uevent available. Sometimes, 0 means nash gets
> an uevent, but the uevent isn't block-related (for exmaple, usb). If by coincidence,
> kernel tells nash that uevents are available, but kernel also set timeout, nash
> might stops collecting other uevents in queue if current uevent isn't block-related.
> I work out a patch for nash to fix it. 
> http://bugzilla.kernel.org/attachment.cgi?id=18858
> 
> 2) root=LABEL=/, system always can't boot. initrd init reports
> switchroot fails. Here is an executation branch of nash when booting:
>     (1) nash read /sys/block/sda/dev; Assume major is 8 (on my desktop)
>     (2) nash query /proc/devices with the major number; It found line  "8 sd";
>     (3) nash use 'sd' to search its own probe table to find device (DISK) type for the device
>        and add it to its own list;
>     (4) Later on, it probes all devices in its list to get filesystem labels;
>        scsi register "8 sd" always.
> When major is 259, nash fails to find the device(DISK) type. I enables CONFIG_DEBUG_BLOCK_EXT_DEVT=y
> when compiling kernel, so 259 is picked up for device /dev/sda1, which causes nash to fail
> to find device (DISK) type.
> To fixing issue 2), I create a patch for nash and another patch for kernel.
> http://bugzilla.kernel.org/attachment.cgi?id=18859
> http://bugzilla.kernel.org/attachment.cgi?id=18837
> 
> Below is the patch for kernel 2.6.28-rc4. It registers blkext, a new block device in proc/devices.
> 
> With 2 patches on nash and 1 patch on kernel, I boot my machines for dozens of times
> without failure.
> 
> Signed-off-by Zhang Yanmin <yanmin.zhang@...ux.intel.com>
> 
> Would you like to accept the kernel patch into your testing tree? Pls. do CC to me when replying
> as I couldn't subscribe LKML emails now.
> 
> ---
> 
> --- linux-2.6.28-rc4/block/genhd.c	2008-11-11 08:37:24.000000000 +0800
> +++ linux-2.6.28-rc4_label/block/genhd.c	2008-11-13 04:05:35.000000000 +0800
> @@ -1028,6 +1028,7 @@ static int __init proc_genhd_init(void)
>  {
>  	proc_create("diskstats", 0, NULL, &proc_diskstats_operations);
>  	proc_create("partitions", 0, NULL, &proc_partitions_operations);
> +	register_blkdev(BLOCK_EXT_MAJOR, "blkext");
>  	return 0;
>  }
>  module_init(proc_genhd_init);

It's procfs-specific init, what's up?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ