[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4ACA96B9.7000909@gmail.com>
Date: Mon, 05 Oct 2009 18:00:41 -0700
From: "Justin P. Mattock" <justinmattock@...il.com>
To: Ingo Molnar <mingo@...e.hu>
CC: Jason Baron <jbaron@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Li Zefan <lizf@...fujitsu.com>,
Steven Rostedt <rostedt@...dmis.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: system gets stuck in a lock during boot
Justin Mattock wrote:
> On Sun, Oct 4, 2009 at 10:41 AM, Ingo Molnar<mingo@...e.hu> wrote:
>
>> * Jason Baron<jbaron@...hat.com> wrote:
>>
>>
>>> On Mon, Sep 07, 2009 at 02:49:44PM -0700, Justin Mattock wrote:
>>>
>>>>>> * Justin P. Mattock<justinmattock@...il.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Ingo Molnar wrote:
>>>>>>>
>>>>>>>
>>>>>>>> * Justin Mattock<justinmattock@...il.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> O.K. I feel better, deleted
>>>>>>>>> my system, and threw in a minimal built system
>>>>>>>>> with only the bare essentials to boot.
>>>>>>>>> (just to make sure things are correct).
>>>>>>>>>
>>>>>>>>> unfortunately after building rc6 I'm still hitting
>>>>>>>>> this. really am not sure why this is happening.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Could you please double-check the bisection result by doing this:
>>>>>>>>
>>>>>>>> git revert af6af30c0f
>>>>>>>>
>>>>>>>> on the latest kernel and seeing whether that fixes the lockup?
>>>>>>>>
>>>>>>>> Bisections are very efficient and hence very sensitive as well to
>>>>>>>> minimal errors. Just one small mistake near the end of a bisection
>>>>>>>> can blame the wrong commit.
>>>>>>>>
>>>>>>>> So the best way to double-check such 100%-triggerable crashes is to
>>>>>>>> do the revert. I tried the revert and it can be done fine here.
>>>>>>>>
>>>>>>>> [ _If_ that does not fix the bug then to save time you can
>>>>>>>> 'backtrack' the bisection, instead of re-doing it completely.
>>>>>>>> I.e. you have your bisection log, re-check the final steps going
>>>>>>>> backwards. Once you find a discrepancy (i.e. a 'bad' point that
>>>>>>>> is 'good' or the other way around), redo the bisection log
>>>>>>>> commands up to that point and continue it up to the end. ]
>>>>>>>>
>>>>>>>> Ingo
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> shoot, I did not see your post here. when looking at my bisect
>>>>>>> log, I guess after a git bisect reset it clears?
>>>>>>>
>>>>>>> Anyways after git bisect had finished I looked manually at the
>>>>>>> commits that it had generated the one which I had sent in a post
>>>>>>> previously, and this one:
>>>>>>>
>>>>>>> 9424edc2da097c8589fcc24a72552d33e54be161
>>>>>>>
>>>>>>>
>>>>>> (this commit has no effect on your kernel image, at all.)
>>>>>>
>>>>>>
>>>>>>
>>>>> yep. but it was worth a try.
>>>>>
>>>>>>> at the time looking at the commit, I see this to be more of the
>>>>>>> cause because of it being related to elf as so forth, but as soon
>>>>>>> as I reverted this on rc6 made no difference.(the previous commit
>>>>>>> fixes this for me, on a regular tar.ball as well as in git.
>>>>>>>
>>>>>>> I think at this point since this system is a fresh from scratch
>>>>>>> build, I think something might be wrong that I'm doing (all the
>>>>>>> CFLAGS, and such are in a previous post).
>>>>>>>
>>>>>>> At the moment I don't have a problem applying a patch to the
>>>>>>> kernel for this. especially since I'm the only one that seems to
>>>>>>> be hitting this, then if more and more reports of this happen then
>>>>>>> we can go from there.
>>>>>>>
>>>>>>>
>>>>>> What would be nice is to verify your bisection end result, i.e. do
>>>>>> what i suggested:
>>>>>>
>>>>>>
>>>>>>
>>>>> yeah I've done this on both kernels three to be exact, and all boot after
>>>>> reverting
>>>>> Fix perf-tracepoint OOPS.
>>>>>
>>>>> As for my system, I'm still convinced that I might be doing something wrong
>>>>> over here.
>>>>>
>>>>>
>>>>>>>> Could you please double-check the bisection result by doing this:
>>>>>>>>
>>>>>>>> git revert af6af30c0f
>>>>>>>>
>>>>>>>> on the latest kernel and seeing whether that fixes the lockup?
>>>>>>>>
>>>>>>>>
>>>>>> if this doesnt fix it on latest -git then this commit is not the
>>>>>> cause of the lockup.
>>>>>>
>>>>>> Ingo
>>>>>>
>>>>>>
>>>>>>
>>>>> This commit(Fix perf-tracepoint OOPS.)does fix my stuckage, but I'm left, as
>>>>> well as others asking
>>>>> the question of why.
>>>>> In any case I still think I'm setting something wrong with either gcc, or
>>>>> something
>>>>> that might be causing this from userland.
>>>>>
>>>>> Justin P. Mattock
>>>>>
>>>>>
>>>> O.k. here something awkward about this issue I was
>>>> experiencing. at the moment I have two imac's
>>>> here the descriptions:
>>>>
>>>> imac A) the one with the problem
>>>>
>>>> OS: built from the clfs book
>>>> x86_64 multilib with only lib64
>>>>
>>>> built everything with these flags:
>>>> CFLAGS="-m64 -mtune=core2 -march=core2
>>>> -mfpmath=both -O2 -pipe -fomit-frame-pointer
>>>> -fstack-protection"
>>>> CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}"
>>>> while compiling everything with
>>>> gcc version: 4.5.0 20090730
>>>>
>>>>
>>>> imac B) the one that works
>>>>
>>>> OS: clfs(just built a few days ago)
>>>> x86_64 pure64 bit build
>>>> (lib with a symlink to lib64)
>>>> CFLAGS="-m64 -mtune=core2 -march=core2
>>>> -O2 -pipe -fomit-frame-pointer"
>>>> CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}"
>>>> gcc version: 4.4.1 (GCC for Cross-LFS 4.4.1.20090722)
>>>>
>>>> The only things I can think of is either I hit something
>>>> because of gcc, something goes wrong with the libraries,
>>>> or there something happening with either the option
>>>> of mfpmath=both or stackprotection.
>>>>
>>>> At this point since the kernel seems to be running fine,
>>>> is to just trash the system that has this issue and just leave
>>>> it at, I was hitting some weird anomaly.
>>>>
>>>>
>>> hi Justin,
>>>
>>> I've been playing around with gcc '4.5' as well and hit a panic that
>>> looks very similar to what you've seen with stock 2.6.31 - I haven't
>>> seen it anywhere else. Anyways, it seems to be some sort of alignment
>>> issue with the 'struct ftrace_event_call'. I'm not sure yet if this is a
>>> compiler or kernel issue. But the following kernel patch fixes the issue
>>> for me. It would be interesting to verify if the patch also resolves the
>>> issue for you.
>>>
>> Would be nice to know precisely what kind of problem is being hit here -
>> we'd like to fix either the kernel or GCC - depending on where the bug
>> lies.
>>
>> Ingo
>>
>>
>
> So I wasn't going crazy....
> Anyways that system(clfs)
> I still have, I can go ahead and
> put it back on the machine and see if I hit this
> again(keep in mind, just got back from a 7hr drive,
> so it might be tomorrow).
>
>
o.k. I put back on that system, and
hit the error. I add your patch to 2.6.31-rc6,
and the latest git(a few days old).
I still am hitting this, but with your patch
I'm able to see the beginning of this panic:
(Ill write it manually)
[ 2.523966] kernel panic - not syncing: No init found. try passing
init= option
to the kernel
[ 2.524394] Pid: 1, comm: swapper Not tainted 2.6.31-rc6 #6
[ 2.524633] Call Trace:
[ 2.524875] [<ffffffff813a5b72>] panic+0x75/0x120
[ 2.525119] [<ffffffff8100910f>] init_post+0xef/0xf5
[ 2.525357] [<ffffffff815f6cf0>] kernel_init+0x198/0x1a3
[ 2.525600] [<ffffffff8102410a>] child_rip+0xa/0x20
[ 2.525842] [<ffffffff815f6b58>] ? kernel_init+0x0/0x1a3
[ 2.526084] [>ffffffff810224100>] ? child_rip+0x0/0x20
Seems I only hit this with using gcc 4.5.0 and compiling
sysvinit with SELinux support to load the policy at boot.
(here's the patch I used
http://readlist.com/lists/tycho.nsa.gov/selinux/3/15451.html).
Sound's like gcc is doing something(correct me if I'm
wrong) because the other systems I have are using the same
packages except for and older version of gcc.
maybe I should update sysvinit with a better patch to load the policy.
Justin P. Mattock
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists