linux-kernel - Re: [bug] SLOB crash, 2.6.24-rc2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20071115112820.GA18228@elte.hu>
Date:	Thu, 15 Nov 2007 12:28:20 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Nick Piggin <nickpiggin@...oo.com.au>
Cc:	David Miller <davem@...emloft.net>, mpm@...enic.com, rjw@...k.pl,
	linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
	torvalds@...ux-foundation.org, Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [bug] SLOB crash, 2.6.24-rc2

* Nick Piggin <nickpiggin@...oo.com.au> wrote:

> On Thursday 15 November 2007 21:43, Ingo Molnar wrote:
> > * David Miller <davem@...emloft.net> wrote:
> > > From: Matt Mackall <mpm@...enic.com>
> > > Date: Wed, 14 Nov 2007 17:37:13 -0600
> > >
> > > > No, the usual strategy for debugging problems -outside- SLOB is to
> > > > switch to another allocator with more extensive debugging facilities.
> > >
> > > Ok, so the thing we still can do is do a dump_stack() at the list
> > > debugging assertion trigger points.
> >
> > ok, i'll first try to trigger it again.
> 
> I had implemented SLOB in userspace, so I resynched and think I found 
> your problem. Sorry for the attachment format -- this mailer isn't the 
> best. I'm really computer illiterate when it comes to userspace...

thx, i'll try your fix in a minute.

> Anyway, I'm really happy to see you're testing and using SLOB upstream
> :) Is there any particular reason that you're using it?

i sometimes test SLOB for -rt, but this time it's the result of my 
"automated random QA" effort, as part of arch/x86 maintainance/QA.

the main trick is to build and booting random "make randconfig" 
bzImages. That finds build bugs and a good deal of boot hang and crash 
bugs as well. (it also found a compiler bug already) I can build and 
boot about 1000 random kernels in 24 hours, and it's all fully 
automated. I usually run it overnight - when a kernel does not come up 
due to a bootup hang or crash (or the kernel log signals any exception 
condition) then the script stops and i can fix it in the morning.

The first step towards this was to get allyesconfig bzImage kernels to 
build and boot fine. That effort took months (we had many problems in 
this area) - i think you saw bugreports and fixes from me about that on 
lkml.

Once that worked reasonably well i made a small Kconfig patch that 
forcibly selects a "minimum set" of drivers and kernel subsystems that 
are needed to boot up a testsystem. Once a "make allnoconfig" and a 
"make allyesconfig" bzImage kernel boots up fine on the testbox all 
randconfig configs "inbetween" are supposed to build and boot fine as 
well.

I also have a patch that adds all the x86 boot options like nosmp, 
maxcpus=1, nohz=off, hpet=disable to be selectable as .config options - 
so those boot options are randomized as well.

I also have a small patch that disables half a dozen drivers/features 
that are not expected to work out of box in a bzImage kernel. (such as 
ISA drivers that assume the presence of hardware, or root filesystem 
features such as NFSROOT)

the resulting make randconfig kernel still has 99% of the degrees of 
freedom that a stock make randconfig kernel has, so by all practical 
purposes it's a fully random kernel - it just happens to boot on my 
testsystem all the time.

A successful bootup means the test system is able to boot up into a 
stock Fedora 8 userspace and is able to bring up its network interfaces 
and ssh out (automatically) to the build box to signal the completion of 
a successful test cycle. The logs are also analyzed for lockdep 
assertions (if lockdep is enabled - which it is in about 20% of the 
randconfig kernels) and other kernel bugs.

(just in case you were wondering about one of the reasons why the 
arch/x86 unification merge went so smoothly, with nary a regression ;-) 
Thomas is doing other types of automated QA of the x86 queue as well.)

this method found the SG-list corruption bugs the following night after 
Linus committed Jen's SG-list changes, so it's pretty good at finding 
regressions as early as possible.

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/