linux-kernel - Re: [Bug #11608] 2.6.27-rc6 BUG: unable to handle kernel paging request

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080925030322.GC4401@wotan.suse.de>
Date:	Thu, 25 Sep 2008 05:03:22 +0200
From:	Nick Piggin <npiggin@...e.de>
To:	Chuck Ebbert <cebbert@...hat.com>
Cc:	"Rafael J. Wysocki" <rjw@...k.pl>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Kernel Testers List <kernel-testers@...r.kernel.org>,
	John Daiker <daikerjohn@...il.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [Bug #11608] 2.6.27-rc6 BUG: unable to handle kernel paging request

On Wed, Sep 24, 2008 at 08:46:55PM -0400, Chuck Ebbert wrote:
> On Sun, 21 Sep 2008 20:54:23 +0200 (CEST)
> "Rafael J. Wysocki" <rjw@...k.pl> wrote:
> 
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.26.  Please verify if it still should be listed and let me know
> > (either way).
> > 
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=11608
> > Subject		: 2.6.27-rc6 BUG: unable to handle kernel paging request
> > Submitter	: John Daiker <daikerjohn@...il.com>
> > Date		: 2008-09-16 23:00 (6 days old)
> > References	: http://marc.info/?l=linux-kernel&m=122160611517267&w=4
> > 
> > 
> 
> As I said in the bugzilla entry:
> 
>   Oops: 000b
> 
>   Bit 3 is set -- the processor detected 1's in reserved bits of the page directory.
> 
> That can't be good...

54384.988151] BUG: unable to handle kernel paging request at ffff8800601dd000
[54384.992095] IP: [<ffffffff80375457>] clear_page_c+0x7/0x10
[54384.992095] PGD 202063 PUD 8067 PMD 65d54163 PTE 80002020601dd163
[54384.992095] Oops: 000b [1] SMP DEBUG_PAGEALLOC

I initially suspect PAT (maybe via DEBUG_PAGEALLOC)... but let's see if the
3rd line here is useful.

     xRRRRRRRRRRRRRRRRRRRRRRR|40b|<--MAXPHYS     PHYS-->|...RR.actuwp
PGD:                                         001000000010000001100011

     xRRRRRRRRRRRRRRRRRRRRRRR|40b|<--MAXPHYS     PHYS-->|...RR.actuwp
PUD:                                                 1000000001100111

     xRRRRRRRRRRRRRRRRRRRRRRR|40b|<--MAXPHYS     PHYS-->|...Rs.actuwp
PMD:                                 01100101110101010100000101100011

     xRRRRRRRRRRRRRRRRRRRRRRR|40b|<--MAXPHYS     PHYS-->|...gP.actuwp
PTE: 1000000000000000001000000010000001100000000111011101000101100011
     3210987654321098765432109876543210987654321098765432109876543210

Is this a 36-bit physical address CPU? In which case you have 2 bits in
the pte that are outside "maxphys". Or if it is a 40-bit CPU, then you
have just 1 bit outside maxphys, in which case I'd say it is memory
corruption (maybe a hardware bug, maybe a scribble from elsewhere). So
I'm wrong about PAT.

Interestingly, the PMD also has a 1 set in a reserved bit (page global),
but according to the Intel docs, the CPU doesn't check that bit, so it
is not faulting there.

Does the machine survive memtest? Is the bug reproduceable? If the
answer is no to either of these, I think we can take it off the
regression list. Otherwise, is it possible to track down to a specific
commit?

Thanks,
Nick

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/