[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1mykqfvu6.fsf@frodo.ebiederm.org>
Date: Wed, 09 Jul 2008 17:04:33 -0700
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Mike Travis <travis@....com>
Cc: "H. Peter Anvin" <hpa@...or.com>,
Jeremy Fitzhardinge <jeremy@...p.org>,
Christoph Lameter <cl@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...e.hu>,
Andrew Morton <akpm@...ux-foundation.org>,
Jack Steiner <steiner@....com>
Subject: Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpu area
Mike Travis <travis@....com> writes:
> What I meant was using early_printk in place of printk, which seems to stuff the
> messages into the log buf until the serial console is setup fairly late in
> start_kernel.
> I did this by removing printk() and renaming early_printk() to be printk (and a
> couple
> other things like #define early_printk printk ...
Last I looked after the magic early_printk setup. printk calls early_printk
and stuff messages in the log buffer.
It matters little though. As long as you get the print messages. Weird
cases where you don't get into C code worry me much more.
Once you get into C things are much easier to track.
>> Is stack overflow the only problem you are seeing or are there still other
> mysteries?
>
> I'm not entirely sure it's a stack overflow, the fault has a NULL dereference
> and
> then the stack overflow message.
Ok. Interesting.
>>> Only a few of these though I would think might get called early in
>>> the boot, that might also be contributing to the stack overflow.
>>
>> Still the call chain depth shouldn't really be changing. So why should it
>> matter? Ah. The high cpu count is growing cpumask_t so when you put
>> it on the stack. That makes sense. So what stars out as a 4 byte
>> variable on the stack in a normal setup winds up being a 1k variable
>> with 4k cpus.
>
> Yes, it's definitely the three related:
>
> NR_CPUS Patch_Applied THREAD_ORDER Results
> 256 NO 1 works (obviously ;-)
> 256 YES 1 works
> 4096 NO 1 works
> 4096 YES 1 panics
> 4096 YES 3 works (just happened to pick 3,
> 2 probably will work as well.)
> I've been testing NR_CPUS=4096 for quite a while and it's been very
> reliable. It's just weird that this config fails with this new patch
> applied. (default configs and some fairly normal distro configs also
> work fine.) And with the zillion config straws we now have, spotting
> the arbitrary needle is proving difficult. ;-)
Right. Just please split your patch up. It would be good to see
if simply changing the per cpu segment address to 0 is related
to your problem. Or if it the other logic changes necessary to
put the use the pda as a per cpu variable?
I just noticed that we always allocate the pda in the per cpu section.
> One reason I've been sticking with 4.2.4.
Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists