[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6278d2220707110843i16d3a325nebec8cb766a40a5e@mail.gmail.com>
Date: Wed, 11 Jul 2007 16:43:57 +0100
From: "Daniel J Blueman" <daniel.blueman@...il.com>
To: "Stephen Hemminger" <shemminger@...ux-foundation.org>
Cc: "Linux Netdev" <netdev@...r.kernel.org>
Subject: Re: sky2 hangs without any messages
On 11/07/07, Stephen Hemminger <shemminger@...ux-foundation.org> wrote:
> On Wed, 11 Jul 2007 11:15:20 +0100
> "Daniel J Blueman" <daniel.blueman@...il.com> wrote:
>
> > On 05/07/07, Stephen Hemminger <shemminger@...ux-foundation.org> wrote:
> > > Well, it didn't fix my test, but it made it better. The following seemed
> > > to work longer...
> > >
> > > --- a/drivers/net/sky2.c 2007-07-05 09:09:45.000000000 -0700
> > > +++ b/drivers/net/sky2.c 2007-07-05 09:09:51.000000000 -0700
> > > @@ -2490,6 +2490,13 @@ static int sky2_poll(struct net_device *
> > >
> > > work_done = sky2_status_intr(hw, work_limit);
> > > if (work_done < work_limit) {
> > > + /* Bug/Errata workaround?
> > > + * Need to kick the TX irq moderation timer.
> > > + */
> > > + if (sky2_read8(hw, STAT_TX_TIMER_CTRL) == TIM_START) {
> > > + sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_STOP);
> > > + sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_START);
> > > + }
> > > netif_rx_complete(dev0);
> > >
> > > /* end of interrupt, re-enables also acts as I/O synchronization */
> >
> > I spoke too soon on this. With the above patch on 2.6.22-rc7, it
> > failed much sooner than the previous patch with the
> > read32(B0_Y2_SP_LISR); I'll try to reproduce with the older patch.
> >
> > Note the ifconfig error/dropped/frame count at the time of failure:
> >
> > # ethtool -g lan0
> > Ring parameters for lan0:
> > Pre-set maximums:
> > RX: 168
> > RX Mini: 0
> > RX Jumbo: 0
> > TX: 511
> > Current hardware settings:
> > RX: 168
> > RX Mini: 0
> > RX Jumbo: 0
> > TX: 511
> >
> > # ethtool -a lan0
> > Pause parameters for lan0:
> > Autonegotiate: on
> > RX: on
> > TX: on
> >
> > # ethtool -c lan0
> > Coalesce parameters for lan0:
> > Adaptive RX: off TX: off
> > stats-block-usecs: 0
> > sample-interval: 0
> > pkt-rate-low: 0
> > pkt-rate-high: 0
> >
> > rx-usecs: 100
> > rx-frames: 16
> > rx-usecs-irq: 20
> > rx-frames-irq: 16
> >
> > tx-usecs: 1000
> > tx-frames: 10
> > tx-usecs-irq: 0
> > tx-frames-irq: 0
> >
> > rx-usecs-low: 0
> > rx-frame-low: 0
> > tx-usecs-low: 0
> > tx-frame-low: 0
> >
> > rx-usecs-high: 0
> > rx-frame-high: 0
> > tx-usecs-high: 0
> > tx-frame-high: 0
> >
> > # ethtool -k lan0
> > Offload parameters for lan0:
> > Cannot get device udp large send offload settings: Operation not supported
> > rx-checksumming: on
> > tx-checksumming: on
> > scatter-gather: on
> > tcp segmentation offload: on
> > udp fragmentation offload: off
> > generic segmentation offload: off
> >
> > # ethtool -S lan0
> > NIC statistics:
> > tx_bytes: 2624901638
> > rx_bytes: 125131827
> > tx_broadcast: 177
> > rx_broadcast: 245
> > tx_multicast: 0
> > rx_multicast: 0
> > tx_unicast: 1818345
> > rx_unicast: 973657
> > tx_mac_pause: 0
> > rx_mac_pause: 0
> > collisions: 0
> > late_collision: 0
> > aborted: 0
> > single_collisions: 0
> > multi_collisions: 0
> > rx_short: 0
> > rx_runt: 0
> > rx_64_byte_packets: 2475
> > rx_65_to_127_byte_packets: 891841
> > rx_128_to_255_byte_packets: 3748
> > rx_256_to_511_byte_packets: 42082
> > rx_512_to_1023_byte_packets: 3133
> > rx_1024_to_1518_byte_packets: 30623
> > rx_1518_to_max_byte_packets: 0
> > rx_too_long: 0
> > rx_fifo_overflow: 0
> > rx_jabber: 0
> > rx_fcs_error: 0
> > tx_64_byte_packets: 1429
> > tx_65_to_127_byte_packets: 35881
> > tx_128_to_255_byte_packets: 17013
> > tx_256_to_511_byte_packets: 25872
> > tx_512_to_1023_byte_packets: 30901
> > tx_1024_to_1518_byte_packets: 1707426
> > tx_1519_to_max_byte_packets: 0
> > tx_fifo_underrun: 0
> >
> > # ifconfig lan0
> > lan0 Link encap:Ethernet HWaddr 00:03:2D:05:9C:27
> > inet addr:192.168.0.250 Bcast:192.168.0.255 Mask:255.255.255.0
> > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> > RX packets:973893 errors:1 dropped:1 overruns:0 frame:1
> > TX packets:819179 errors:0 dropped:0 overruns:0 carrier:0
> > collisions:0 txqueuelen:1000
> > RX bytes:107601061 (102.6 MiB) TX bytes:2551658362 (2.3 GiB)
> > Interrupt:16
> >
> > # dmesg
> > ...
> > ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 16
> > PCI: Setting latency timer of device 0000:01:00.0 to 64
> > sky2 0000:01:00.0: v1.14 addr 0xdfbfc000 irq 16 Yukon-EC (0xb6) rev 1
> > sky2 eth1: addr 00:03:2d:05:9c:27
> > sky2 lan0: enabling interface
> > sky2 lan0: ram buffer 48K
> > sky2 lan0: Link is up at 1000 Mbps, full duplex, flow control both
> > ...
> > lan0: hw csum failure.
> > [<b02b707c>] __skb_checksum_complete_head+0x5c/0x60
> > [<b02b7088>] __skb_checksum_complete+0x8/0x10
> > [<b0313aab>] nf_ip_checksum+0xbb/0x130
> > [<b02d8b9c>] udp_error+0x13c/0x1b0
> > [<b02ba4cd>] dev_hard_start_xmit+0x1cd/0x230
> > [<b02e93c0>] ip_finish_output+0x0/0x260
> > [<b02d8a60>] udp_error+0x0/0x1b0
> > [<b02d5736>] nf_conntrack_in+0xf6/0x4d0
> > [<b02bbe85>] dev_queue_xmit+0x95/0x260
> > [<b02eac51>] ip_output+0x141/0x2e0
> > [<b02e93c0>] ip_finish_output+0x0/0x260
> > [<b02ea20f>] ip_queue_xmit+0x1cf/0x3d0
> > [<b02e7cd0>] dst_output+0x0/0x10
> > [<b02d33a3>] nf_iterate+0x63/0x90
> > [<b02e4fb0>] ip_rcv_finish+0x0/0x280
> > [<b02d3519>] nf_hook_slow+0x59/0xe0
> > [<b02e4fb0>] ip_rcv_finish+0x0/0x280
> > [<b02e5740>] ip_rcv+0x2f0/0x4d0
> > [<b02e4fb0>] ip_rcv_finish+0x0/0x280
> > [<b0321d56>] packet_rcv_spkt+0xe6/0x180
> > [<b02b9f38>] netif_receive_skb+0x1f8/0x2e0
> > [<f0840db1>] sky2_poll+0x351/0x9c0 [sky2]
> > [<b01206b4>] run_timer_softirq+0x124/0x180
> > [<b02bbc6c>] net_rx_action+0x5c/0x100
> > [<b011dd62>] __do_softirq+0x42/0x90
> > [<b010642c>] do_softirq+0x5c/0xb0
> > [<b0139e30>] handle_edge_irq+0x0/0xe0
> > [<b011dc8a>] irq_exit+0x5a/0x60
> > [<b01064ec>] do_IRQ+0x6c/0xb0
> > [<b0104807>] common_interrupt+0x23/0x28
> > [<b0420000>] xt_tcpudp_init+0x0/0x10
> > [<b0102c9a>] default_idle+0x2a/0x40
> > [<b01023d3>] cpu_idle+0x43/0x70
> > [<b0404b25>] start_kernel+0x215/0x2a0
> > [<b0404450>] unknown_bootoption+0x0/0x260
>
> The last message means some how frame was received with checksum for count
> wrong. I have only seen it when coalescing is messed up.
>
> I ran for 2+ days with the patch, and only 20min without. Usually my ISP connection
> gives up after that because of crappy DSL box, and that makes DNS not work.
It wedged when I was copying a few GBs of data from my server to a
local disk at the time, and running rsync over ssh on a large file on
my server to my laptop's disk.
This would be the typical load that would cause the NIC to lockup from
missing an IRQ or otherwise, however, it did feel like the new code
didn't un-wedge the Yukon-EC's bus master unit.
What other tricks can be used to reset the Yukon-EC's bus master unit?
I'll try the read32(B0_Y2_SP_LISR) trick, as before.
Daniel
--
Daniel J Blueman
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists