lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 3 Mar 2008 08:46:20 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Arjan van de Ven <arjan@...ux.intel.com>
Cc:	torvalds@...ux-foundation.org, hans.rosenfeld@....com,
	linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>
Subject: Re: bisected boot regression post 2.6.25-rc3.. please revert


* Arjan van de Ven <arjan@...ux.intel.com> wrote:

> Hi Linus, Ingo, Hans,
> 
> Please revert commit cded932b75ab0a5f9181ee3da34a0a488d1a14fd ( x86: 
> fix pmd_bad and pud_bad to support huge pages ) since it prevents the 
> kernel to finish booting on my (Penryn based) laptop. The boot stops 
> right after freeing the init memory. Took a while to bisect (due to it 
> touching page*.h, which forces a full recompile), but it definitely is 
> caused by this commit...

hm, lets figure out why this patch breaks your box, ok? We obviously 
have to revert it if we cannot figure it out, but lets at least try - 
because the patch itself fixes a real regression and it's not obviously 
wrong either. I think there might be some bug hiding somewhere that we 
really want to fix instead of this revert.

Could you try to hack up a debug patch perhaps? Uninline the fucntion(s) 
then add two versions (one is that breaks on your box and one is that 
works on your box) of this same pmd_bad()/pud_bad() functions and do 
something like this (pseudocode):

pmd_bad()
{
	if (pmd_bad_working(x) != pmd_bad_broken(x))
		panic_timeout++;

	return pmd_bad_working(x);
}

i.e. we actually use the working function so your box should boot up 
just fine - but we instrument things with the broken function as well 
and detect the cases where the two values differ. It is an anomaly if 
either function ever returns true instead of false.

if after bootup you have a non-zero panic_timeout then there is a 
material difference somewhere. In that case try to stick a dump_stack() 
into the above case as well - and if the box still boots, send us the 
stackdumps. (if the box doesnt boot then perhaps the printout itself 
hangs - in that case try a save_stack_trace() hack and print it out 
later)

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ