[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4602C83B.20608@shadowen.org>
Date: Thu, 22 Mar 2007 18:17:31 +0000
From: Andy Whitcroft <apw@...dowen.org>
To: Con Kolivas <kernel@...ivas.org>
CC: Andy Whitcroft <apw@...dowen.org>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, Steve Fox <drfickle@...ibm.com>,
"Martin J. Bligh" <mbligh@...igh.org>
Subject: Re: 2.6.21-rc4-mm1
Andy Whitcroft wrote:
> Con Kolivas wrote:
>> On Thursday 22 March 2007 20:48, Andy Whitcroft wrote:
>>> Andy Whitcroft wrote:
>>>> Andy Whitcroft wrote:
>>>>> Andrew Morton wrote:
>>>>>> Temporarily at
>>>>>>
>>>>>> http://userweb.kernel.org/~akpm/2.6.21-rc4-mm1/
>>>>>>
>>>>>> Will appear later at
>>>>>>
>>>>>>
>>>>>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc
>>>>>> 4/2.6.21-rc4-mm1/
>>>>> [All of the below is from the pre hot-fix runs. The very few results
>>>>> which are in for the hot-fix runs seem worse if anything. :( All
>>>>> results should be out on TKO.]
>>>>>
>>>>>> - Restored the RSDL CPU scheduler (a new version thereof)
>>>>> Unsure if the above is the culprit but there seems to be a smattering of
>>>>> BUG's in kernbench from the schedular on several systems, and panics
>>>>> which do not fully dump out.
>>>>>
>>>>> elm3b239 is about 2/4 kernbench being the test in progress when we
>>>>> blammo in both failed tests, elm3b234 doesn't boot at all.
>>>> Well I have one result through for backing RSDL out on elm3b239 and that
>>>> does indeed seem to give us a successful boot and test. peterz has
>>>> pointed me to an incremental patch from Con which I'll push through
>>>> testing and see if that sorts it out.
>>> Ok, tested the patch below on top of 2.6.21-rc4-mm1 and this seems to
>>> fix the problem:
>>>
>>> http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc4-mm1-rsdl-0.32.p
>>> atch
>>>
>>> Hard to tell from that patch whether it will be fixed in the changes
>>> already committed to the next -mm.
>>>
>>> Its possible that it may be fixed by the following patch:
>>>
>>> sched-rsdl-improvements.patch
>>>
>>> Which has the following slipped in at the end of the changelog:
>>>
>>> A tiny change checking for MAX_PRIO in normal_prio()
>>> may prevent oopses on bootup on large SMP due to
>>> forking off the idle task.
>>>
>>> Con, are all the changes in the 0.32 patch above with akpm?
>> Yes he's queued everything in that patch you tested for the next -mm. Thanks
>> very much for testing it.
>
> No worries. I've just got through the results on the other machine in
> the mix. That machine seems to be fixed by backing out RSDL and not by
> the fixup 0.32 patch ...
>
> This second machine seems to had hard very soon after user space starts
> executing but without a panic. I can't say that the symptoms are very
> definitive, but I do have a good result from that machine without RSDL
> and not with rsdl-0.32.
>
> The machine is a dual-core x86_64 machine: Dual Core AMD Opteron(tm)
> Processor 275.
>
> I'll let you know if I find out anything else. Shout if you want any
> information or have anything you want poked or tested.
Ok, I have yet a third x86_64 machine is is blowing up with the latest
2.6.21-rc4-mm1+hotfixes+rsdl-0.32 but working with
2.6.21-rc4-mm1+hotfixes-RSDL. I have results on various hotfix levels
so I have just fired off a set of tests across the affected machines on
that latest hotfix stack plus the RSDL backout and the results should be
in in the next hour or two.
I think there is a strong correlation between RSDL and these hangs. Any
suggestions as to the next step.
-apw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists