linux-kernel - Re: 2.6.30-rc8 Oops whilst booting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1244460875.12644.2.camel@ht.satnam>
Date:	Mon, 08 Jun 2009 17:04:35 +0530
From:	Jaswinder Singh Rajput <jaswinder@...nel.org>
To:	Chris Clayton <chris2553@...glemail.com>
Cc:	NeilBrown <neilb@...e.de>, linux-kernel@...r.kernel.org,
	James Bottomley <james.bottomley@...senpartnership.com>,
	scsi <linux-scsi@...r.kernel.org>, Tejun Heo <tj@...nel.org>,
	Arjan van de Ven <arjan@...ux.intel.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: 2.6.30-rc8 Oops whilst booting

Hello Chris,

On Mon, 2009-06-08 at 11:58 +0100, Chris Clayton wrote:
> 2009/6/8 Chris Clayton <chris2553@...glemail.com>:
> > Hi Neil,
> >
> > Thanks for the reply.
> >
> > 2009/6/7 NeilBrown <neilb@...e.de>:
> >> On Mon, June 8, 2009 8:31 am, Jaswinder Singh Rajput wrote:
> >>> On Sun, 2009-06-07 at 19:38 +0100, Chris Clayton wrote:
> >>>> 2009/6/7 Jaswinder Singh Ra
> >>>> >> > http://img231.imageshack.us/img231/8931/dscn0610.jpg
> >>
> >> This message says that it found a vfat filesystem on 8:3x (I cannot see
> >> what digit should be 'x').  That is probably sdc1 or sdc2. Maybe even
> >> sdc6 or sdc7.
> >> However the vfat filesystem didn't have /sbin/init.
> >>
> >
> >>>> http://img99.imageshack.us/my.php?image=dscn0617b.jpg
> >>
> >> This one says it couldn't find anything at 8,22, which I think
> >> should be sdb6.
> >> It also shows that you have and sdc6, but sdb only goes up to sdb3.
> >>
> >> So it seems that your disk drives have changed name - not a wholely
> >> unexpected event these days.
> >>
> >> We now need answers to questions like:
> >>  - what device do you expect the root filesystem to be on
> >>  - how is the kernel being told this?  Maybe it is hard coded
> >>    into your initrd.  Knowing which distro and what /etc/fstab
> >>    says might help (though it wouldn't help me, I'm just about out
> >>    of my depth at this point)
> >> Maybe if you changed /etc/fstab to mount by uuid instead of hardcoding
> >> e.g. /etc/sdb3, and then run "mkinitramfs" or whatever, it might work.
> >>
> >
> > Yes, I've just been looking at the photographs of the panics again and
> > I've noticed that two of my discs are being detected in the "wrong
> > order". There are three HDDS. The first, /dev/sda, is the master on
> > the first IDE port and contains sda1..sda7. The second, normally
> > /dev/sdb, is the slave on that port and contains sdb1..sdb6. The
> > third, normally /dev/sdc, is attached to the first SATA port and
> > contains sdc1..sdc3. The second photograph I posted shows that sdb and
> > sdc have been reversed. The first partition on the disc that is
> > normally /dev/sdb does indeed have a FAT32 filesystem in the first
> > partition.
> >
> > By the way, I should have said that in between the panics that the two
> > photographs show, I copied contents of /dev/sdc1, which I normally
> > boot from, to /dev/sdb6, so that I minimised the risk to sdc1 in the
> > reboot festival that bisecting would involve. I also, of course,
> > changed the name of the root partition that is passed to the kernel by
> > GRUB and amended /etc/fstab on /dev/sdb6. That's why the partitions
> > shown in the photographs seem inconsistent. Sorry I forgot to mention
> > that - I really shouldn't do these things late at night :-).
> >
> > As I indicate above, when booting the partition I have set up to do
> > this bisecting,  I expect the root filesystem to be on /dev/hdb6. As I
> > also indicate, this information is passed to the kernel through GRUB's
> > /boot/grub/menu.lst. The kernel is configured specifically for my
> > system and the drivers needed to boot the system are built in to the
> > kernel, so I don't use an initrd. IIRC, that's the way Slackware is
> > installed today, except, of course, it's a big fat kernel with all
> > drivers needed to boot any system built in. I could be wrong on that
> > though, it's a while since I installed
> >
> > As to the distro, it used to be (the now defunct) Peanut Linux, which
> > was derived from Slackware. However, it's years since I installed it
> > and I have upgraded just about everything in user space and added many
> > other things (udev, dbus...). I don't think that makes any difference
> > here, though, because we don't get as far as user space. On a
> > successful boot, the system is stable and runs trouble-free for
> > several hours a day, every day.
> >
> > Hope this helps.
> >
> > I'm a good way through bisecting again and this time the system has to
> > boot without a panic 100 times before I mark a kernel as good. I'll
> > post the result later.
> >
> 
> Finally got to the end of the bisection/reboot festival. I ended up here:
> 
> [chris:~/kernel/linux-2.6]$ git bisect good
> d5a877e8dd409d8c702986d06485c374b705d340 is first bad commit
> commit d5a877e8dd409d8c702986d06485c374b705d340
> Author: James Bottomley <James.Bottomley@...senPartnership.com>
> Date:   Sun May 24 13:03:43 2009 -0700
> 
>     async: make sure independent async domains can't accidentally entangle
> 
>     The problem occurs when async_synchronize_full_domain() is called when
>     the async_pending list is not empty.  This will cause lowest_running()
>     to return the cookie of the first entry on the async_pending list, which
>     might be nothing at all to do with the domain being asked for and thus
>     cause the domain synchronization to wait for an unrelated domain.   This
>     can cause a deadlock if domain synchronization is used from one domain
>     to wait for another.
> 
>     Fix by running over the async_pending list to see if any pending items
>     actually belong to our domain (and return their cookies if they do).
> 
>     Signed-off-by: James Bottomley <James.Bottomley@...senPartnership.com>
>     Signed-off-by: Arjan van de Ven <arjan@...ux.intel.com>
>     Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>
> 
> :040000 040000 fab1e0c06572605a7015061db4a7e0a77c04fa91
> 34252dbb7fed3942f5952c25639564bbd77357da M      kernel
> 
> I can't claim to know what the change actually means, but the change
> seems to be a much better candidate than my previous bisection outcome
> where I required only 20 "panicless" boots to regard the kernel as
> good. As I said earlier today, this time I required 100 such boots.
> 
> I'll revert that change, give the new kernel the reboot treatment :-)
> and report back later.
> 

Good work. Please also share this info with other signed-off members, So
adding CC.

Thanks,
--
JSR

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/