linux-kernel - Re: system gets stuck in a lock during boot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dd18b0c30909071449q6834e847yb0f27ec971c9564a@mail.gmail.com>
Date:	Mon, 7 Sep 2009 14:49:44 -0700
From:	Justin Mattock <justinmattock@...il.com>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Li Zefan <lizf@...fujitsu.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: system gets stuck in a lock during boot

On Wed, Aug 26, 2009 at 7:42 AM, Justin P.
Mattock<justinmattock@...il.com> wrote:
> Ingo Molnar wrote:
>>
>> * Justin P. Mattock<justinmattock@...il.com>  wrote:
>>
>>
>>>
>>> Ingo Molnar wrote:
>>>
>>>>
>>>> * Justin Mattock<justinmattock@...il.com>   wrote:
>>>>
>>>>
>>>>
>>>>>
>>>>> O.K. I feel better, deleted
>>>>> my system, and threw in a minimal built system
>>>>> with only the bare essentials to boot.
>>>>> (just to make sure things are correct).
>>>>>
>>>>> unfortunately after building rc6 I'm still hitting
>>>>> this. really am not sure why this is happening.
>>>>>
>>>>>
>>>>
>>>> Could you please double-check the bisection result by doing this:
>>>>
>>>>   git revert af6af30c0f
>>>>
>>>> on the latest kernel and seeing whether that fixes the lockup?
>>>>
>>>> Bisections are very efficient and hence very sensitive as well to
>>>> minimal errors. Just one small mistake near the end of a bisection
>>>> can blame the wrong commit.
>>>>
>>>> So the best way to double-check such 100%-triggerable crashes is to
>>>> do the revert. I tried the revert and it can be done fine here.
>>>>
>>>> [ _If_ that does not fix the bug then to save time you can
>>>>     'backtrack' the bisection, instead of re-doing it completely.
>>>>     I.e. you have your bisection log, re-check the final steps going
>>>>     backwards. Once you find a discrepancy (i.e. a 'bad' point that
>>>>     is 'good' or the other way around), redo the bisection log
>>>>     commands up to that point and continue it up to the end. ]
>>>>
>>>>        Ingo
>>>>
>>>>
>>>>
>>>
>>> shoot, I did not see your post here. when looking at my bisect
>>> log, I guess after a git bisect reset it clears?
>>>
>>> Anyways after git bisect had finished I looked manually at the
>>> commits that it had generated the one which I had sent in a post
>>> previously, and this one:
>>>
>>>  9424edc2da097c8589fcc24a72552d33e54be161
>>>
>>
>> (this commit has no effect on your kernel image, at all.)
>>
>>
>
> yep. but it was worth a try.
>>>
>>> at the time looking at the commit, I see this to be more of the
>>> cause because of it being related to elf as so forth, but as soon
>>> as I reverted this on rc6 made no difference.(the previous commit
>>> fixes this for me, on a regular tar.ball as well as in git.
>>>
>>> I think at this point since this system is a fresh from scratch
>>> build, I think something might be wrong that I'm doing (all the
>>> CFLAGS, and such are in a previous post).
>>>
>>> At the moment I don't have a problem applying a patch to the
>>> kernel for this. especially since I'm the only one that seems to
>>> be hitting this, then if more and more reports of this happen then
>>> we can go from there.
>>>
>>
>> What would be nice is to verify your bisection end result, i.e. do
>> what i suggested:
>>
>>
>
> yeah I've done this on both kernels three to be exact, and all boot after
> reverting
> Fix perf-tracepoint OOPS.
>
> As for my system, I'm still convinced that I might be doing something wrong
> over here.
>
>>>> Could you please double-check the bisection result by doing this:
>>>>
>>>>   git revert af6af30c0f
>>>>
>>>> on the latest kernel and seeing whether that fixes the lockup?
>>>>
>>
>> if this doesnt fix it on latest -git then this commit is not the
>> cause of the lockup.
>>
>>        Ingo
>>
>>
>
> This commit(Fix perf-tracepoint OOPS.)does fix my stuckage, but I'm left, as
> well as others asking
> the question of why.
> In any case I still think I'm setting something wrong with either gcc, or
> something
> that might be causing this from userland.
>
> Justin P. Mattock
>

O.k. here something awkward about this issue I was
experiencing. at the moment I have two imac's
here the descriptions:

imac A) the one with the problem

OS: built from the clfs book
x86_64 multilib with only lib64

built everything with these flags:
CFLAGS="-m64 -mtune=core2 -march=core2
-mfpmath=both -O2 -pipe -fomit-frame-pointer
-fstack-protection"
CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}"
while compiling everything with
gcc version: 4.5.0 20090730


imac B) the one that works

OS: clfs(just built a few days ago)
x86_64 pure64 bit build
(lib with a symlink to lib64)
CFLAGS="-m64 -mtune=core2 -march=core2
 -O2 -pipe -fomit-frame-pointer"
CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}"
gcc version: 4.4.1 (GCC for Cross-LFS 4.4.1.20090722)

The only things I can think of is either I hit something
because of gcc, something goes wrong with the libraries,
or there something happening with either the option
of mfpmath=both or stackprotection.

At this point since the kernel seems to be running fine,
is to just trash the system that has this issue and just leave
it at, I was hitting some weird anomaly.


-- 
Justin P. Mattock
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/