[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200711300848.15424.el@prans.net>
Date: Fri, 30 Nov 2007 08:48:15 -0500
From: Elvis Pranskevichus <el@...ns.net>
To: Stephen Hemminger <shemminger@...ux-foundation.org>
Cc: Paul Collins <paul@...ly.ondioline.org>, netdev@...r.kernel.org
Subject: Re: sky2: eth0: hung mac 7:69 fifo 0 (165:176)
On Sun November 25 2007 04:57:42 pm Elvis Pranskevichus wrote:
> On Sunday November 25 2007 04:25:06 pm Stephen Hemminger wrote:
> > Two important bits of data:
> >
> > 1) What is hardware (output of lspci and dmesg) would be useful to know
> > which type
> > of board is involved.
>
> uname -srvm:
>
> Linux 2.6.24-rc3 #1 SMP PREEMPT Sat Nov 17 00:26:41 EST 2007 x86_64
>
> CONFIG_NO_HZ=y
>
> lscpi -vvvv:
>
> 03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E
> Gigabit Ethernet Controller (rev 22) Subsystem: Giga-byte Technology
> Marvell 88E8053 Gigabit Ethernet Controller (Gigabyte) Control: I/O+ Mem+
> BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
> DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 32 bytes
> Interrupt: pin A routed to IRQ 315
> Region 0: Memory at f1000000 (64-bit, non-prefetchable) [size=16K]
> Region 2: I/O ports at a000 [size=256]
> [virtual] Expansion ROM at f0000000 [disabled] [size=128K]
> Capabilities: [48] Power Management version 2
> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
> PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME-
> Capabilities: [50] Vital Product Data
> Capabilities: [5c] Message Signalled Interrupts: Mask- 64bit+
> Queue=0/1 Enable+ Address: 00000000fee0300c Data: 4199
> Capabilities: [e0] Express (v1) Legacy Endpoint, MSI 00
> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> unlimited, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
> DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd-
> ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 512
> bytes DevSta: CorrErr+ UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ TransPend-
> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Latency L0 <256ns, L1
> unlimited ClockPM- Suprise- LLActRep- BwNot-
> LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain-
> CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed
> 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> Capabilities: [100] Advanced Error Reporting
>
> dmesg | grep sky2:
>
> sky2 0000:03:00.0: v1.20 addr 0xf1000000 irq 16 Yukon-EC (0xb6) rev 2
> sky2 eth0: addr 00:16:e6:84:58:5d
> sky2 eth0: enabling interface
> sky2 eth0: Link is up at 100 Mbps, full duplex, flow control both
>
> Error related part:
>
> sky2 eth0: hung mac 123:3 fifo 194 (150:144)
> sky2 eth0: receiver hang detected
> sky2 eth0: disabling interface
> NETDEV WATCHDOG: eth0: transmit timed out
> sky2 eth0: tx timeout
> sky2 eth0: transmit ring 178 .. 188 report=178 done=178
> NETDEV WATCHDOG: eth0: transmit timed out
> sky2 eth0: tx timeout
> sky2 eth0: transmit ring 178 .. 188 report=178 done=178
> ...
> <repeats endlessly>
>
> > 2) Is this a regression, or always the case. Does 2.6.23 work okay?
>
> 2.6.23 works okay in terms of restarting the controller properly,
> i.e sky2_watchdog() actually works. While in 2.6.24 I only see that
> sky2_down() is called and never gets to sky2_up(). Moreover, the entire
> box becomes unresponsive to events (e.g the keyboard doesn't work etc).
>
> > The problems with FIFO in the past, have been limited to Yukon-EC
> > without flow control.
> > The hardware has bugs where if the FIFO gets exactly filled it hangs.
> > Flow control avoids
> > the problem.
>
> Yeah, unfortunately it's Yukon-EC.
>
>
> Thanks,
Hi Stephen,
I was able to investigate this issue a little further by adding a bunch of
printks in the problem area. What I discovered was that when the card hangs
and sky2_watchdog() kicks in, the sky2_restart() process stucks at
napi_synchronize() in sky2_down().
@@ -1699,6 +1695,9 @@ static int sky2_down(struct net_device *dev)
ctrl &= ~(GM_GPCR_TX_ENA | GM_GPCR_RX_ENA);
gma_write16(hw, port, GM_GP_CTRL, ctrl);
/* Make sure no packets are pending */
--> napi_synchronize(&hw->napi);
This was introduced by commit 6de16237c78a9d: sky2: shutdown cleanup.
My guess is that napi still tries hard to send some packets even though at
that point the card is not capable of sending anything, thus the loop inside
napi_synchronize() becomes an infinite one.
I've removed this line for now to see if it helps on the next hang =)
Thanks,
--
Elvis
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists