linux-kernel - Re: [crash, bisected] Re: [PATCH 3/4] x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <m1mykqfvu6.fsf@frodo.ebiederm.org>
Date:	Wed, 09 Jul 2008 17:04:33 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Mike Travis <travis@....com>
Cc:	"H. Peter Anvin" <hpa@...or.com>,
	Jeremy Fitzhardinge <jeremy@...p.org>,
	Christoph Lameter <cl@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jack Steiner <steiner@....com>
Subject: Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpu area

Mike Travis <travis@....com> writes:

> What I meant was using early_printk in place of printk, which seems to stuff the
> messages into the log buf until the serial console is setup fairly late in
> start_kernel.
> I did this by removing printk() and renaming early_printk() to be printk (and a
> couple
> other things like #define early_printk printk ...

Last I looked after the magic early_printk setup.  printk calls early_printk
and stuff messages in the log buffer.

It matters little though.  As long as you get the print messages.  Weird
cases where you don't get into C code worry me much more.

Once you get into C things are much easier to track.

>> Is stack overflow the only problem you are seeing or are there still other
> mysteries?
>
> I'm not entirely sure it's a stack overflow, the fault has a NULL dereference
> and
> then the stack overflow message.

Ok.  Interesting.

>>> Only a few of these though I would think might get called early in
>>> the boot, that might also be contributing to the stack overflow.
>> 
>> Still the call chain depth shouldn't really be changing.  So why should it
>> matter?  Ah.  The high cpu count is growing cpumask_t so when you put
>> it on the stack.  That makes sense.  So what stars out as a 4 byte
>> variable on the stack in a normal setup winds up being a 1k variable
>> with 4k cpus.
>
> Yes, it's definitely the three related:
>
> NR_CPUS Patch_Applied THREAD_ORDER Results
>   256        NO           1        works (obviously ;-)
>   256        YES          1        works
>  4096        NO           1        works
>  4096        YES          1        panics
>  4096        YES          3        works (just happened to pick 3,
> 					  2 probably will work as well.)

> I've been testing NR_CPUS=4096 for quite a while and it's been very
> reliable.  It's just weird that this config fails with this new patch
> applied.  (default configs and some fairly normal distro configs also
> work fine.)  And with the zillion config straws we now have, spotting
> the arbitrary needle is proving difficult. ;-) 

Right.  Just please split your patch up.  It would be good to see
if simply changing the per cpu segment address to 0 is related
to your problem.  Or if it the other logic changes necessary to 
put the use the pda as a per cpu variable?

I just noticed that we always allocate the pda in the per cpu section.

> One reason I've been sticking with 4.2.4.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/