[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080922112452.GB5314@ff.dom.local>
Date: Mon, 22 Sep 2008 11:24:52 +0000
From: Jarek Poplawski <jarkao2@...il.com>
To: Badalian Vyacheslav <slavon@...telecom.ru>
Cc: Denys Fedoryshchenko <denys@...p.net.lb>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: Machine Check Exception Re: NetDev! Please help!
On Mon, Sep 22, 2008 at 01:40:35PM +0400, Badalian Vyacheslav wrote:
> Thanks for answer Jarek!
> I post it is bugtrack - http://bugzilla.kernel.org/show_bug.cgi?id=11618
>
> I not think that its hardware error because this problem we have in 10
> servers on 2.6.26.2 kernel +)
> On Friday night i compile 2.6.26.5 and have 2 panic on 1 pc what have
> max load and 1 panic on other pc.
> I write to netdev list because first messages looks like:
>
> [ 4956.420298] CPU 1: Machine Check Exception: 0000000000000005
> [ 4956.420298] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
> [ 4956.420300] Tx Queue <0>
> [ 4956.420300] TDH <81>
> [ 4956.420301] TDT <81>
> [ 4956.420302] next_to_use <81>
> [ 4956.420302] next_to_clean <d6>
> [ 4956.420303] buffer_info[next_to_clean]
> [ 4956.420303] time_stamp <15498d>
> [ 4956.420304] next_to_watch <d6>
> [ 4956.420304] jiffies <15511c>
> [ 4956.420305] next_to_watch.status <1>
> [ 4956.420537] eth1: Detected Tx Unit Hang:
> [ 4956.420538] TDH <b0>
> [ 4956.420538] TDT <b0>
> [ 4956.420539] next_to_use <b0>
> [ 4956.420539] next_to_clean <5>
> [ 4956.420540] buffer_info[next_to_clean]:
> [ 4956.420540] time_stamp <15498e>
> [ 4956.420541] next_to_watch <5>
> [ 4956.420542] jiffies <15511c>
> [ 4956.420542] next_to_watch.status <1>
> [ 4956.423064] CPU 1: Bank 0: 3200004000000800
> [ 4956.423190] CPU 1: Bank 5: 3200220024080400
> [ 4956.423315] Kernel panic - not syncing: CPU context corrupt
> [ 4956.423933] Rebooting in 3 seconds..
Yes, similar messages are often netdev problems, but not with
this Machine Check Exception with this CPU context corrupt,
which should mean some severe hardware problem (unless some bug,
probably not netdev, triggers them).
>
> But in 2.6.26.5 i not see errors like this 2 days... Also if system not have network load - i can't do panic by cpuburn or compiling sources...
> Anyone i think its good that my message also go to general mail-list and bugzilla...
>
> I try get more info... if you or anyone have idea how test this bug - i can do it)
I see you have some advice in bugzilla. These people really know more
about these things, so you should try this first. I think, they expect
you to compile the most current kernel version (tip) using git for
this. You can do this using the instructions from Ingo Molnar's README.
Make a script from this: from the beginning to the "git checkout ...".
Of course you have to install git before. After running the commands
it will download the kernel sources to a subdir (takes time). Copy your
config there, make oldconfig, make etc. Then send them dmesg after
rebooting. If you have any problems - write. Alternatively, I guess,
you could try the current 2.6.27-rc7 kernel at least.
Jarek P.
BTW: could you try to trigger this bug with one network card off?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists