[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20061214144734.03300fa6@freekitty>
Date: Thu, 14 Dec 2006 14:47:34 -0800
From: Stephen Hemminger <shemminger@...l.org>
To: Alex Romosan <romosan@...orax.lbl.gov>
Cc: netdev@...r.kernel.org
Subject: Re: 2.6.20-rc1 sky2 problems (regression?)
On Thu, 14 Dec 2006 14:25:06 -0800
Alex Romosan <romosan@...orax.lbl.gov> wrote:
> Stephen Hemminger <shemminger@...l.org> writes:
>
> > 4) What is the IRQ routing?
> > There are two issues here, first the driver will never work with edge
> > trigger IRQ's, some motherboards also have busted BIOS and chipsets
> > that don't do MSI properly. A couple of module parameters are available
> > to help:
> > disable_msi=1 avoids using MSI
> > idle_timeout=10 polls for lost IRQ's every N ms (10)
>
> i didn't take long to lock up the machine again. i've rebooted back
> into stock 2.6.20-rc1 and added the two module parameters above. cat
> /proc/interrupts now gives me:
>
> 17: 203 IO-APIC-fasteoi eth0, CMI8738
>
> so i guess the MSI interrupts are disabled. we'll see how this works.
probably won't do much but now the IRQ ends up shared.
> > 5) What are the messages in the console log when problem happens?
>
> kernel: NETDEV WATCHDOG: eth0: transmit timed out
> kernel: sky2 eth0: tx timeout
> kernel: sky2 eth0: transmit ring 402 .. 361 report=406 done=406
> kernel: sky2 status report lost?
The transmit timeout code trys to be smart, but doesn't really
recover properly if hardware is stuck.
> > 7) Please get a current version of ethtool from:
> > git://git.kernel.org/pub/scm/network/ethtool/ethtool.git
> > and run ethtool register dump after a problem occurs:
> > ethtool -d eth0
>
> this is the output after it stopped working:
>
>
> PCI config
> ----------
> 00: ab 11 62 43 07 04 18 00 15 00 00 02 08 00 00 00
> 10: 04 c0 df fd 00 00 00 00 01 ce 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 62 14 8c 05
> 30: 00 00 00 00 48 00 00 00 00 00 00 00 03 01 00 00
> 40: 00 00 f0 01 00 80 a0 01 01 50 02 fe 00 20 00 14
> 50: 03 5c 00 80 00 00 00 01 00 00 00 01 05 e0 83 00
> 60: 0c 10 e0 fe 00 00 00 00 61 41 00 00 00 00 00 00
> 70: 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Control Registers
> -----------------
> Register Access Port 0x00
> LED Control/Status 0xA603164A
> Interrupt Source 0x40000000
> Interrupt Mask 0xC000001D
> Interrupt Hardware Error Source 0x00000000
> Interrupt Hardware Error Mask 0x2E003F3F
>
> Bus Management Unit
> -------------------
> CSR Receive Queue 1 0x00010000
> CSR Sync Queue 1 0xFFFFFFFF
> CSR Async Queue 1 0x00000000
>
> MAC Addresses
> ---------------
> Addr 1 00 11 09 DA 39 A3
> Addr 2 00 11 09 DA 39 A3
> Addr 3 00 00 00 00 00 00
>
> Connector type 0x4A (J)
> PMD type 0x54 (T)
> PHY type 0x80
> Chip Id 0xB6 Yukon-2 EC
> (rev 0)
> Ram Buffer 0x0C
>
> Status BMU:
> -----------
> Control 0x0002220A
> Last Index 0x07FF
> Put Index 0x0601
> List Address 0x000000007FBF8000
> Transmit 1 done index 0x0196
> Transmit index threshold 0x000A
>
> Status FIFO
> Write Pointer 0x16
> Read Pointer 0x16
> Level 0x00
> Watermark 0x10
> ISR Watermark 0x10
> Status level
> Init 0x000030D4 Value 0x00000D00
> Test 0x04 Control 0x02
> TX status
> Init 0x0001E848 Value 0x0001E848
> Test 0x04 Control 0x02
> ISR
> Init 0x000009C4 Value 0x000009C4
> Test 0x04 Control 0x02
>
> GMAC control 0x005A
> GPHY control 0x2002
> LINK control 0x02
>
> GMAC 1
> Status 0xD000
> Control 0x1800
> Transmit 0x1000
> Receive 0xE000
> Transmit flow control 0xFFFF
> Transmit parameter 0xD7C4
> Serial mode 0x221E
> Source address: 00 11 09 DA 39 A3
> Physical address: 00 11 09 DA 39 A3
>
> Rx GMAC 1
> End Address 0x0000007F
> Almost Full Thresh 0x00000070
> Control/Test 0x0900228A
> FIFO Flush Mask 0x000018FB
> FIFO Flush Threshold 0x0000000B
> Truncation Threshold 0x0000017C
> Upper Pause Threshold 0x00000000
> Lower Pause Threshold 0x00000081
> VLAN Tag 0x00000074
> FIFO Write Pointer 0x00000000
> FIFO Write Level 0x0000007B
> FIFO Read Pointer 0x00000000
> FIFO Read Level 0x00000079
>
> Tx GMAC 1
> End Address 0x0000007F
> Almost Full Thresh 0x00000010
> Control/Test 0x0102220A
> FIFO Flush Mask 0x00000000
> FIFO Flush Threshold 0x00000000
> Truncation Threshold 0x00000000
> Upper Pause Threshold 0x00000000
> Lower Pause Threshold 0x00000081
> VLAN Tag 0x0000002A
> FIFO Write Pointer 0x0000002A
> FIFO Write Level 0x00000000
> FIFO Read Pointer 0x00000000
> FIFO Read Level 0x0000002A
>
> Receive Queue 1
> ---------------
> Buffer control 0x05F8
> Byte Counter 49408
> Descriptor Address 0x0000000076F4F810
> Status 0x05EA0100
> Timestamp 0x00000000
> BMU Control/Status 0x000061AA
> Done 0x0000
> Request 0x0000000076F4F810
> Csum1 Offset 52057 Piston 14
> Csum2 Offset 52057 Positing 14
>
> Sync Transmit Queue 1
> ---------------
> Descriptor Address 0x0000000000000000
> Address Counter 0x0000000000000000
> Current Byte Counter 0
> BMU Control/Status 0x00000000
> Flag & FIFO Address 0x00000000
>
> Control 0x00000000
> Next 0x00000000
> Data 0x0000000000000000
> Status 0x00000000
> Timestamp 0x00000000
> Csum Start 0x0000 Pos 0 Write 0
>
> Async Transmit Queue 1
> ---------------
> Buffer control 0x053D
> Byte Counter 49950
> Descriptor Address 0x0000000047237000
> Status 0x000005EA
> Timestamp 0x00010000
> BMU Control/Status 0x800011AA
> Done 0x0000
> Request 0x000000004723753D
> Csum Start 0x0032 Pos 0 Write 0
>
> Receive RAMbuffer 1
> ---------------
> Start Address 0x00000000
> End Address 0x00000E7F
> Write Pointer 0x00000079
> Read Pointer 0x0000007E
> Upper Threshold/Pause Packets 0x00000D80
> Lower Threshold/Pause Packets 0x000003A0
> Upper Threshold/High Priority 0x00000AE0
> Lower Threshold/High Priority 0x00000740
> Packet Counter 0x00000029
> Level 0x00000E7B
> Test 0x0002221A
>
> Sync Transmit RAMbuffer 1
> ---------------
> Start Address 0x00000000
> End Address 0x00000000
> Write Pointer 0x00000000
> Read Pointer 0x00000000
> Packet Counter 0x00000000
> Level 0x00000000
> Test 0x00000000
>
> Async Transmit RAMbuffer 1
> ---------------
> Start Address 0x00000E80
> End Address 0x000017FF
> Write Pointer 0x0000132A
> Read Pointer 0x0000132A
> Packet Counter 0x00000000
> Level 0x00000000
> Test 0x0002222A
>
> i don't know if it helps but i am also including the output of ethtool
> while the card was still working:
>
>
> PCI config
> ----------
> 00: ab 11 62 43 07 04 10 00 15 00 00 02 08 00 00 00
> 10: 04 c0 df fd 00 00 00 00 01 ce 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 62 14 8c 05
> 30: 00 00 00 00 48 00 00 00 00 00 00 00 03 01 00 00
> 40: 00 00 f0 01 00 80 a0 01 01 50 02 fe 00 20 00 14
> 50: 03 5c 00 80 00 00 00 01 00 00 00 01 05 e0 83 00
> 60: 0c 10 e0 fe 00 00 00 00 61 41 00 00 00 00 00 00
> 70: 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Control Registers
> -----------------
> Register Access Port 0x00
> LED Control/Status 0xA603164A
> Interrupt Source 0x00000000
> Interrupt Mask 0xC000001D
> Interrupt Hardware Error Source 0x00000000
> Interrupt Hardware Error Mask 0x2E003F3F
>
> Bus Management Unit
> -------------------
> CSR Receive Queue 1 0x00010000
> CSR Sync Queue 1 0xFFFFFFFF
> CSR Async Queue 1 0x00000000
>
> MAC Addresses
> ---------------
> Addr 1 00 11 09 DA 39 A3
> Addr 2 00 11 09 DA 39 A3
> Addr 3 00 00 00 00 00 00
>
> Connector type 0x4A (J)
> PMD type 0x54 (T)
> PHY type 0x80
> Chip Id 0xB6 Yukon-2 EC
> (rev 0)
> Ram Buffer 0x0C
>
> Status BMU:
> -----------
> Control 0x0002220A
> Last Index 0x07FF
> Put Index 0x00B8
> List Address 0x000000007FBF8000
> Transmit 1 done index 0x0057
> Transmit index threshold 0x000A
>
> Status FIFO
> Write Pointer 0x08
> Read Pointer 0x08
> Level 0x00
> Watermark 0x10
> ISR Watermark 0x10
> Status level
> Init 0x000030D4 Value 0x000030D4
> Test 0x04 Control 0x02
> TX status
> Init 0x0001E848 Value 0x0001E848
> Test 0x04 Control 0x02
> ISR
> Init 0x000009C4 Value 0x000009C4
> Test 0x04 Control 0x02
>
> GMAC control 0x005A
> GPHY control 0x2002
> LINK control 0x02
>
> GMAC 1
> Status 0xD000
> Control 0x1800
> Transmit 0x1000
> Receive 0xE000
> Transmit flow control 0xFFFF
> Transmit parameter 0xD7C4
> Serial mode 0x221E
> Source address: 00 11 09 DA 39 A3
> Physical address: 00 11 09 DA 39 A3
>
> Rx GMAC 1
> End Address 0x0000007F
> Almost Full Thresh 0x00000070
> Control/Test 0x0900228A
> FIFO Flush Mask 0x000018FB
> FIFO Flush Threshold 0x0000000B
> Truncation Threshold 0x0000017C
> Upper Pause Threshold 0x00000000
> Lower Pause Threshold 0x00000081
> VLAN Tag 0x00000027
> FIFO Write Pointer 0x00000000
> FIFO Write Level 0x00000000
> FIFO Read Pointer 0x00000000
> FIFO Read Level 0x00000027
>
> Tx GMAC 1
> End Address 0x0000007F
> Almost Full Thresh 0x00000010
> Control/Test 0x0102220A
> FIFO Flush Mask 0x00000000
> FIFO Flush Threshold 0x00000000
> Truncation Threshold 0x00000000
> Upper Pause Threshold 0x00000000
> Lower Pause Threshold 0x00000081
> VLAN Tag 0x00000032
> FIFO Write Pointer 0x00000032
> FIFO Write Level 0x00000000
> FIFO Read Pointer 0x00000000
> FIFO Read Level 0x00000032
>
> Receive Queue 1
> ---------------
> Buffer control 0x05F8
> Byte Counter 49408
> Descriptor Address 0x000000001727E010
> Status 0x003C0100
> Timestamp 0x00000000
> BMU Control/Status 0x000061AA
> Done 0x0000
> Request 0x000000001727E010
> Csum1 Offset 12632 Piston 14
> Csum2 Offset 12632 Positing 14
>
> Sync Transmit Queue 1
> ---------------
> Descriptor Address 0x0000000000000000
> Address Counter 0x0000000000000000
> Current Byte Counter 0
> BMU Control/Status 0x00000000
> Flag & FIFO Address 0x00000000
>
> Control 0x00000000
> Next 0x00000000
> Data 0x0000000000000000
> Status 0x00000000
> Timestamp 0x00000000
> Csum Start 0x0000 Pos 0 Write 0
>
> Async Transmit Queue 1
> ---------------
> Buffer control 0x06CC
> Byte Counter 49950
> Descriptor Address 0x0000000046AD23C6
> Status 0x000005EA
> Timestamp 0x00010000
> BMU Control/Status 0x800011AA
> Done 0x0000
> Request 0x0000000046AD2A92
> Csum Start 0x0032 Pos 0 Write 0
>
> Receive RAMbuffer 1
> ---------------
> Start Address 0x00000000
> End Address 0x00000E7F
> Write Pointer 0x00000427
> Read Pointer 0x00000427
> Upper Threshold/Pause Packets 0x00000D80
> Lower Threshold/Pause Packets 0x000003A0
> Upper Threshold/High Priority 0x00000AE0
> Lower Threshold/High Priority 0x00000740
> Packet Counter 0x00000000
> Level 0x00000000
> Test 0x0002221A
>
> Sync Transmit RAMbuffer 1
> ---------------
> Start Address 0x00000000
> End Address 0x00000000
> Write Pointer 0x00000000
> Read Pointer 0x00000000
> Packet Counter 0x00000000
> Level 0x00000000
> Test 0x00000000
>
> Async Transmit RAMbuffer 1
> ---------------
> Start Address 0x00000E80
> End Address 0x000017FF
> Write Pointer 0x000017B2
> Read Pointer 0x000017B2
> Packet Counter 0x00000000
> Level 0x00000000
> Test 0x0002222A
>
> i'll try to lock up the networking again and if it still happens i'll
> swith to the vendor driver and see what that has to say.
>
Another useful bit of information is the statistics (ethtool -S eth0).
When there were flow control bugs, they would show up as count of 1.
Are you doing jumbo frames (MTU > 1500)?
--
Stephen Hemminger <shemminger@...l.org>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists