[<prev] [next>] [day] [month] [year] [list]
Message-ID: <46EEAB26.6050400@sophics.cz>
Date: Mon, 17 Sep 2007 18:28:22 +0200
From: Petr Stehlik <pstehlik@...hics.cz>
To: linux-kernel@...r.kernel.org
Subject: forcedeth kernel panic
Hi,
an ASUS M2N32 WS Pro (nVidia MCP55 chipset) based machine with on-board
Gbit ethernet leads to kernel panic under high network load.
The machine is to be a Samba server and got minimal 64bit Debian Etch
installed. First it crashed with stock Debian 2.6.18-amd64 kernel so I
upgraded to 2.6.21 and at last to 2.6.22-2-amd64 (source from Debian).
The crashes varied per kernel but were always fatal (only hard reset
helped) so I decided to post also here (in addition to Debian's BTS
#442877).
The crash occurs under high network load generated by tserv from dbench
package within about 20 minutes of tserv test (run from another machine)
against this machine (which is running tserv_srv).
Before it crashes it fills the kernel log with the following messages
that may or may not be related to the crash:
Sep 17 14:51:27 harapes kernel: eth0: too many iterations (6) in nv_nic_irq.
Sep 17 14:51:58 harapes last message repeated 1026 times
Sep 17 14:52:59 harapes last message repeated 2063 times
Sep 17 14:54:00 harapes last message repeated 2055 times
Sep 17 14:55:01 harapes last message repeated 2044 times
I wrote it may not be related because I got here an older nForce based
machine that is running the tserv against the crashing server and it
also fills the log with the same messages - but fortunately it does not
crash...
After killing the machine several times in a row I googled a bit and
found some suggestions so now I am testing a different setup - the
forcedeth driver loaded with "optimization_mode=1" parameter and so far
(95 minutes of tserv run) it didn't crash...
More details about the hardware: AMD64 3600+ (=2GHz), 2GB of DDR2, 6
SATA drives in RAID1 and RAID5 configuration on the on-board SATA
driver, a PCI S3 graphics and that's it.
dmesg output related to networking:
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.
forcedeth: using HIGHDMA
eth0: forcedeth.c: subsystem: 01043:81fb bound to 0000:00:10.0
eth0: no IPv6 routers present
lspci -vv:
00:10.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a2)
Subsystem: ASUSTeK Computer Inc. Unknown device 81fb
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0 (250ns min, 5000ns max)
Interrupt: pin A routed to IRQ 1272
Region 0: Memory at fe02a000 (32-bit, non-prefetchable) [size=4K]
Region 1: I/O ports at b400 [size=8]
Region 2: Memory at fe029000 (32-bit, non-prefetchable) [size=256]
Region 3: Memory at fe028000 (32-bit, non-prefetchable) [size=16]
Capabilities: [44] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable+ DSel=0 DScale=0 PME-
Capabilities: [70] MSI-X: Enable- Mask- TabSize=8
Vector table: BAR=2 offset=00000000
PBA: BAR=3 offset=00000000
Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+
Queue=0/3 Enable+
Address: 00000000fee0300c Data: 4189
Masking: 000000fe Pending: 00000000
Capabilities: [6c] HyperTransport: MSI Mapping
The incomplete kernel panic dump hand-copied from the stuck console:
Call Trace:
<IRQ> :forcedeth: nv_nic_irq_optimized+0x89/0x22c
handle_IRQ_event+0x25/0x53
__do_softirq+0x55/0xc3
handle_edge_irq+0xe4/0x127
do_IRQ+0x6c/0xd5
default_idle+0x0/0x3d
ret_from_intr+0x0/0xa
<EOI> default_idle+0x29/0x3d
cpu_idle+0x8b/0xae
Code: 8a 83 84 00 00 00 83 e0 f3 83 c8 04 88 83 84 00 00 00 83 7b
RIP :forcedeth:nv_rx_process_optimized+0xe6/0x380
Kernel panic - not syncing: Aiee, killing interrupt handler!
I may have to replace the on-board ethernet with some PCI based card
because I need a reliable server very soon and when it gets deployed I
won't have a chance of playing with it anymore so if there is a
suggestion I could try now for perfect kernel forcedeth stability then
please let me know soon. Is the "optimization_mode=1" the right
solution? What kind of negative impact does it have?
Thanks!
Petr
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists