[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <487AFE97.8090107@embedded-sol.com>
Date: Mon, 14 Jul 2008 10:21:59 +0300
From: Felix Radensky <felix@...edded-sol.com>
To: "Brandeburg, Jesse" <jesse.brandeburg@...el.com>
CC: netdev@...r.kernel.org, Stefan Roese <sr@...x.de>
Subject: Re: e1000e "Detected Tx Unit Hang"
Hi, Jesse
I'm CC-ing Stefan, who ported Linux to this platform.
Applying the patch you suggested did not help. I'm still getting TX unit
hangs. I don't see any netdev watchdog messages. When the hang occurs
I cannot get the prompt, so I cannot run ethtool.
The following command works always
dd if=/dev/zero of=/mnt/test bs=512 count=2
and the following causes Tx unit hang
dd if=/dev/zero of=/mnt/test bs=512 count=3
Stefan, are you aware of any PCIe related problems on Canyonlands ?
AMCC have compatibility chart on their site, which indicates that this
particular card (Intel PRO/1000 T Desktop Adapter) Iwas tested with
linux-2.6.25 kernel.
Thanks a lot.
Felix.
Brandeburg, Jesse wrote:
> Felix Radensky wrote:
>
>> Hi, Jesse
>>
>> I can confirm that I'm also getting these errors with 2.6.26-rc8 on
>> PowerPC platform (AMCC 460EX CPU). The Intel adapter is (as reported
>> by lspci -vv)
>>
>
> Interesting, I haven't heard back from Herbert, but thanks for the
> reply.
>
> are you getting the NETDEV WATCHDOG messages in your log? does ethtool
> -S show any tx_timeout?
>
> can you try applying a patch similar to
> https://sourceforge.net/tracker/download.php?group_id=42302&atid=447449&
> file_id=283326&aid=2007017
>
> aka http://tinyurl.com/5vl5g4
>
>
>
>
>> 41:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit
>> Ethernet Controller (Copper) (rev 06)
>> Subsystem: Intel Corporation PRO/1000 PT Desktop Adapter
>>
>
> x1 PCIe adapter
>
>
>> Some relevant output from dmesg:
>>
>> e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k2
>> e1000e: Copyright (c) 1999-2008 Intel Corporation.
>> e1000e 0000:41:00.0: enabling device (0006 -> 0007)
>> eth2: (PCI Express:2.5GB/s:Width x1) 00:1b:21:1e:2d:2a
>> eth2: Intel(R) PRO/1000 Network Connection
>> eth2: MAC: 1, PHY: 4, PBA No: d50854-003
>> eth2: Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
>> eth2: 10/100 speed: disabling TSO
>>
>> I can reliably reproduce the problem by running
>>
>> dd=/dev/zero of=/mnt/1M bs=1024 count=1024
>>
>> where /mnt is mounted over NFS with the following options (default
>> ones)
>>
>>
> rw,vers=3,rsize=32768,wsize=32768,namlen=255,hard,nointr,nolock,proto=ud
> p,timeo=7,retrans=3,sec=sys,mountproto=udp,addr
>
>> Below is register dump produced by patched driver.
>>
>> eth2: Detected Tx Unit Hang:
>> TDH <25>
>> TDT <25>
>>
>
> Hardware completed all the packets, but no writebacks made it back to
> main memory.
>
>
>> TX Desc ring0 dump
>> Tl[0x000] 0000000000000000 0000000000000000 000000001D734802 0022
>> 2 00000000FFFFD0FE 00000000 NTC
>>
>
> Ewww, even worse, it seems that something zeroed out the memory in the
> tx descriptor ring. I strongly suspect something bad at your
> system/chipset level.
>
>
>
>> Tl[0x001] 0000000000000000 0000000000000000 0000000015FE2A84 057C
>> 1 00000000FFFFD0FE 00000000
>> Tl[0x002] 0000000000000000 0000000000000000 0000000015FA1000 004C
>> 2 00000000FFFFD0FE dd739f00
>> Tl[0x003] 0000000000000000 0000000000000000 000000001D734A02 0022
>> 4 00000000FFFFD0FE 00000000
>> Tl[0x004] 0000000000000000 0000000000000000 0000000015FA104C 05C8
>> 4 00000000FFFFD0FE dd739c80
>> Tl[0x005] 0000000000000000 0000000000000000 000000001D734C02 0022
>> 6 00000000FFFFD0FE 00000000
>> Tl[0x006] 0000000000000000 0000000000000000 0000000015FA1614 05C8
>> 6 00000000FFFFD0FE dd739280
>> Tl[0x007] 0000000000000000 0000000000000000 000000001D734E02 0022
>> 9 00000000FFFFD0FE 00000000
>> Tl[0x008] 0000000000000000 0000000000000000 0000000015FA1BDC 0424
>> 8 00000000FFFFD0FE 00000000
>> Tl[0x009] 0000000000000000 0000000000000000 0000000015EC6000 01A4
>> 9 00000000FFFFD0FE dd7390a0
>> Tl[0x00A] 0000000000000000 0000000000000000 000000001D73A002 0022
>> B 00000000FFFFD0FE 00000000
>> Tl[0x00B] 0000000000000000 0000000000000000 0000000015EC61A4 05C8
>> B 00000000FFFFD0FE dd739e60
>> Tl[0x00C] 000000001D73A202 0000000002000022 000000001D73A202 0022
>> D 00000000FFFFD0FE 00000000
>>
>
> Either the driver is half done cleaning up, which doesn't seem likely
> due to the driver not ZEROING all the first two 64 bit columns, but the
> last column which contains an skb pointer still indicates cleanup hasn't
> completed.
>
> Does this card work at all in your system?
>
> Jesse
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists