[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <46538AEE.4030700@linux-foundation.org>
Date: Tue, 22 May 2007 17:29:34 -0700
From: Stephen Hemminger <shemminger@...ux-foundation.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: Mike Houston <mikeserv@...s.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.22-rc2
Linus Torvalds wrote:
> On Tue, 22 May 2007, Mike Houston wrote:
>
>> In this case I actually had the kernel crash. First time for me ever
>> having a kernel oops! System locked up with keyboard LED's blinking.
>>
>> Not sure if anyone wants to see all of it (maybe some screwy
>> userland stuff involved), so I won't include that mess in the
>> message. It's here:
>> http://www.mikeserv.org/files/kernelcrash.txt
>>
>
> I think you have major memory corruption. That first oops disassembles to
>
> mov 0x10(%eax),%esi
> mov $0xfffffdfd,%eax
> test %esi,%esi
> je after_call
> mov %edx,%ecx
> mov %edi,%eax
> mov %ebx,%edx
> call *%esi
> after_call:
>
> which is (from net/ipv4/af_inet.c, inet_ioctl()):
>
> default:
> if (sk->sk_prot->ioctl)
> err = sk->sk_prot->ioctl(sk, cmd, arg);
> else
> err = -ENOIOCTLCMD;
> break;
>
> and the load off "sk->sk_prot->ioctl" oopses, because "sk->sk_prot" is
> corrupt and contains 0x8e3cad42, which is not a valid kernel pointer.
>
> The other oops is even worse.
>
> I also think it meshes with
>
> sky2 eth0: descriptor error q=0x280 get=285 [800042375e2e5e] put=285
>
>
Descriptor error means, the driver told it to do something but the
OWNER bit wasn't set.
Only ever saw this on the Gigabyte motherboard.
It looks like the chip reads the wrong memory sometimes. The problem
happens only on the on-board NIC's
and only on this kind of motherboard. For testing, I have put code in
to check that the receive data actually
arrived before the IRQ, it triggered on my Gigabyte 925 motherboard. It
appears that DMA access
is messed up. This board has lots of "overclocker" friendly stuff; maybe
the BIOS never really sets up the PCI
bridges and clocks properly.
It doesn't seem like a software or driver problem. I have tried tweaking
PCI registers but nothing worked
in this case.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists