linux-kernel - Re: Fw: 2.6.17 oops, possibly ntfs/mmap related

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 22 Sep 2006 16:47:00 +0930 (CST)
From:	Jonathan Woithe <jwoithe@...sics.adelaide.edu.au>
To:	davej@...hat.com (Dave Jones)
Cc:	hugh@...itas.com (Hugh Dickins), akpm@...l.org (Andrew Morton),
	aia21@....ac.uk (Anton Altaparmakov),
	jwoithe@...sics.adelaide.edu.au (Jonathan Woithe),
	linux-kernel@...r.kernel.org
Subject: Re: Fw: 2.6.17 oops, possibly ntfs/mmap related

> On Thu, Sep 21, 2006 at 08:04:49PM +0100, Hugh Dickins wrote:
> 
>  >   BUG: unable to handle kernel paging request at virtual address 0010c744
>  >    printing eip:
>  >   c013be50
>  >   *pde = 00000000
>  >   Oops: 0002 [#1]
>  >   Modules linked in: ntfs 8139too via_agp agpgart usb_storage ehci_hcd uhci_hcd usbcore
>  >   CPU:    0
>  >   EIP:    0060:[<c013be50>]    Tainted: G   M  VLI
>  >   EFLAGS: 00010282   (2.6.17 #2) 
>  >   EIP is at anon_vma_unlink+0x16/0x3c
>  >   eax: 0010c740   ebx: cf1070cc   ecx: cf107104   edx: cf8bc740
>  >   esi: cf8bc740   edi: b7e82000   ebp: 00000000   esp: cdad7f58
>  > 
>  > I haven't worked out the disassembly in detail to support the idea
>  > (though certainly anon_vma_unlink would be trying to list_del around
>  > here), but that eax and esi do suggest a corrupted list: somehow the
>  > top half of a pointer overwritten by the top half of LIST_POISON1.
>  > 
>  > And in Anton's case, the top half of a pointer overwritten by the
>  > bottom half of LIST_POISON2.
>  > 
>  > Maybe just coincidence, and I've nothing more illuminating to add;
>  > but just a hint of a list_del going very wrong somewhere?
> 
> Given a machine check happened, the state of the machine in general
> is questionable.  I'd recommend a run of memtest86+ 

That was already done.  No memory errors were reported over 10 passes.

Secondly, the machine check indication was only present on one of the two
oopses we saw.  Furthermore, there was no indication in any log files
that a machine check had occurred in the case of the second oops.
Then again, perhaps machine checks don't get logged which would make this
observation irrelevant.

Could we be looking at a dying CPU?

Regards
  jonathan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/