[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070130111520.47e07a79@freekitty>
Date: Tue, 30 Jan 2007 11:15:20 -0800
From: Stephen Hemminger <shemminger@...ux-foundation.org>
To: Chris Lightfoot <chris@...parrot.com>
Cc: Stephen Hemminger <shemminger@...l.org>, netdev@...r.kernel.org
Subject: Re: sky2 problems on Intel Mac Mini
There are a couple problems here:
1) the transmitter is getting hung.
2) the recovery logic doesn't work. If I can reproduce hang,
then maybe the recovery code could be fixable.
Let's address the transmitter hang first.
The transmitter has multiple stages so it could be either:
a) hardware flow control problems
look at ethtool -S eth0 statistics, are there flow control packets
showing up?
b) GMAC or ram buffer issues
looking at 'ethtool -d eth0' output can help, but it is a needle in
haystack finding these setup errors.
The sky2 driver copies most of the stuff from vendor version of sk98lin,
but if sk98lin works and sky2 doesn't then comparing register settings
can give hints.
c) DMA problems
For some problems, I have had luck adding a /proc interface and dumping
the transmit ring after a hang. Looking at the last control block that
hung can help. This found the case where IPV6 TSO was leaking through.
d) checksum problems
Turning off tx scatter/gather forces non fragmented skb's. This hurts
performance, but can tell if the problem is with fragment code.
Turning off tx checksum turns off scatter/gather, checksumming and
TSO.
e) possible alignment and flow control interaction
Because the receive DMA engine has hardware bugs and requires alignment
or it doesn't work with flow control. I still wonder if there are alignment
bugs on Tx with flow control.
f) other driver bug
To save time, I'll go get a new Mac Mini and try and clone this setup.
Could you send me a full kernel config (and other setup information
like filesystem type, distro etc).
> -- I assume this is just the same problem exhibiting on a
> kernel with soft lockups detection enabled?
>
> Hopefully I should be able to actually log into one of
> these machines over an alternate connection next time the
> problem recurs, at which point I should be able to get
> ethtool -d output. Anything else I should do at that
> point?
>
> Any suggestions for what to do next to chase this problem
> down? I haven't yet tried the sk98lin driver on this
> hardware; is that still worth doing? Are there any useful
> tests we should try? Unfortunately, though these crashes
> happen pretty frequently (several times per day
> typically), I don't have a test case to reproduce one;
> however, if it'd be useful, I can probably get a pcap
> trace of the period immediately before the interface falls
> over using port mirroring on the switch to which the
> machines are connected. Is that likely to be informative?
>
The vendor driver does some slightly different setup, but it also
does a hardware reset when inactive (every 10ms).
--
Stephen Hemminger <shemminger@...ux-foundation.org>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists