linux-kernel - Re: [bug] SLOB crash, 2.6.24-rc2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <473C3900.8010508@oak.selfip.net>
Date:	Thu, 15 Nov 2007 12:18:08 +0000
From:	Dave Haywood <tla@....selfip.net>
To:	Ingo Molnar <mingo@...e.hu>
CC:	Nick Piggin <nickpiggin@...oo.com.au>,
	David Miller <davem@...emloft.net>, mpm@...enic.com,
	rjw@...k.pl, linux-kernel@...r.kernel.org,
	akpm@...ux-foundation.org, torvalds@...ux-foundation.org,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [bug] SLOB crash, 2.6.24-rc2

Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@...oo.com.au> wrote:
>
>   
>> On Thursday 15 November 2007 21:43, Ingo Molnar wrote:
>>     
>>> * David Miller <davem@...emloft.net> wrote:
>>>       
>>>> From: Matt Mackall <mpm@...enic.com>
>>>> Date: Wed, 14 Nov 2007 17:37:13 -0600
>>>>
>>>>         
>>>>> No, the usual strategy for debugging problems -outside- SLOB is to
>>>>> switch to another allocator with more extensive debugging facilities.
>>>>>           
>>>> Ok, so the thing we still can do is do a dump_stack() at the list
>>>> debugging assertion trigger points.
>>>>         
>>> ok, i'll first try to trigger it again.
>>>       
>> I had implemented SLOB in userspace, so I resynched and think I found 
>> your problem. Sorry for the attachment format -- this mailer isn't the 
>> best. I'm really computer illiterate when it comes to userspace...
>>     
>
> thx, i'll try your fix in a minute.
>
>   
>> Anyway, I'm really happy to see you're testing and using SLOB upstream
>> :) Is there any particular reason that you're using it?
>>     
>
> i sometimes test SLOB for -rt, but this time it's the result of my 
> "automated random QA" effort, as part of arch/x86 maintainance/QA.
>
> the main trick is to build and booting random "make randconfig" 
> bzImages. That finds build bugs and a good deal of boot hang and crash 
> bugs as well. (it also found a compiler bug already) I can build and 
> boot about 1000 random kernels in 24 hours, and it's all fully 
> automated. I usually run it overnight - when a kernel does not come up 
> due to a bootup hang or crash (or the kernel log signals any exception 
> condition) then the script stops and i can fix it in the morning.
>
> The first step towards this was to get allyesconfig bzImage kernels to 
> build and boot fine. That effort took months (we had many problems in 
> this area) - i think you saw bugreports and fixes from me about that on 
> lkml.
>
> Once that worked reasonably well i made a small Kconfig patch that 
> forcibly selects a "minimum set" of drivers and kernel subsystems that 
> are needed to boot up a testsystem. Once a "make allnoconfig" and a 
> "make allyesconfig" bzImage kernel boots up fine on the testbox all 
> randconfig configs "inbetween" are supposed to build and boot fine as 
> well.
>
> I also have a patch that adds all the x86 boot options like nosmp, 
> maxcpus=1, nohz=off, hpet=disable to be selectable as .config options - 
> so those boot options are randomized as well.
>
> I also have a small patch that disables half a dozen drivers/features 
> that are not expected to work out of box in a bzImage kernel. (such as 
> ISA drivers that assume the presence of hardware, or root filesystem 
> features such as NFSROOT)
>
> the resulting make randconfig kernel still has 99% of the degrees of 
> freedom that a stock make randconfig kernel has, so by all practical 
> purposes it's a fully random kernel - it just happens to boot on my 
> testsystem all the time.
>
> A successful bootup means the test system is able to boot up into a 
> stock Fedora 8 userspace and is able to bring up its network interfaces 
> and ssh out (automatically) to the build box to signal the completion of 
> a successful test cycle. The logs are also analyzed for lockdep 
> assertions (if lockdep is enabled - which it is in about 20% of the 
> randconfig kernels) and other kernel bugs.
>
> (just in case you were wondering about one of the reasons why the 
> arch/x86 unification merge went so smoothly, with nary a regression ;-) 
> Thomas is doing other types of automated QA of the x86 queue as well.)
>
> this method found the SG-list corruption bugs the following night after 
> Linus committed Jen's SG-list changes, so it's pretty good at finding 
> regressions as early as possible.
>
> 	Ingo
>   

  How complete is the QA testing?  I was reading this interesting thread
and it occurred to me that this sounds like a useful distributed
computing application.  ie a central server with all valid Kconfig
combinations (how many are there?) for a particular release (-rc or
otherwise) across all architectures.  These are allocated to clients on
request to be built / booted etc.  Any errors are fed back to the
central server.  I guess this would be a useful resource for
developers.  More importantly (and I don't know if this is the case
already!) a new Linux release (2.6.x) could be "certified" with some
level of testing on known hardware / architectures.

  tbh, I feel sorry for Ingo's machine compiling 1000 random kernels in
24h!  I'm surprised it hasn't called the Samaritans...

Dave.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/