lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080422190901.GA1104@elte.hu>
Date:	Tue, 22 Apr 2008 21:09:01 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Jiri Slaby <jirislaby@...il.com>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	"Rafael J. Wysocki" <rjw@...k.pl>, paulmck@...ux.vnet.ibm.com,
	David Miller <davem@...emloft.net>,
	linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
	linux-ext4@...r.kernel.org, herbert@...dor.apana.org.au,
	Zdenek Kabelac <zdenek.kabelac@...il.com>,
	"H. Peter Anvin" <hpa@...or.com>
Subject: Re: 2.6.25-git2: BUG: unable to handle kernel paging request at
	ffffffffffffffff


* Ingo Molnar <mingo@...e.hu> wrote:

> > Yesterday I did 2 suspend/resumes after 1 hour of uptime and ran 
> > git-status for a fraction of a second until it was killed. So I can 
> > perfectly reproduce it when I suspend, resume and produce some io 
> > load. I guess it's time to bisect 2.6.25-rc8-mm2 as I'm able to 
> > reproduce it the best and haven't seen that bug in -rc8-mm1 for over 
> > week of suspending and working.
> 
> the most dangerous x86 change we added was the PAT stuff. Does it 
> influence the crashes in any way if you boot with 'nopat' or if you 
> disable CONFIG_X86_PAT=y into the .config?

note that full PAT (where in essence Linux takes over control of the 
cache attributes via PTEs, instead of relying on the BIOS initialized 
MTRRs alone) you should only get with -mm or with x86.git applied.

I.e. x86 PAT might explain any -mm issue but not the upstream -git 
issue.

In upstream -git we dont have the second wave of the PAT changes applied 
yet (the /dev/mem bits) so CONFIG_X86_PAT is not yet activated. (it's 
only safe to enable if we have all the changes together and perfectly 
control all cache attributes in the system)

i.e. PAT complications here would not happen in form of real cache 
attribute conflicts [i.e. the lockups and corruptions cannot be due to 
that] - but as side-effects to other code it changes.

and most of the PAT failures we ever saw had different patterns anyway: 
the leading failure was API rejections and hence non-working Xorg or 
non-working ioremap() in certain drivers. The worst-case scenario, early 
in the PAT code's cycle, was a spontaneous triple fault - months ago.

the basis for the PAT changes was the hardening of the CPA code and its 
general use for everything (such as DEBUG_PAGEALLOC). And much of that 
happened and was finished in v2.6.25. Nothing conceptually new really 
happened there - and even where we touched the code in .26 it happened 
long ago and would have surfaced by now.

... but ... nothing can be excluded.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ