[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6278d2220707111439r5ea69a29v51cdbef1cbb7ab25@mail.gmail.com>
Date: Wed, 11 Jul 2007 22:39:49 +0100
From: "Daniel J Blueman" <daniel.blueman@...il.com>
To: "Stephen Hemminger" <shemminger@...ux-foundation.org>
Cc: "Linux Netdev" <netdev@...r.kernel.org>
Subject: Re: sky2 hangs without any messages
On 11/07/07, Daniel J Blueman <daniel.blueman@...il.com> wrote:
> > > On 05/07/07, Stephen Hemminger <shemminger@...ux-foundation.org> wrote:
> > > > Well, it didn't fix my test, but it made it better. The following seemed
> > > > to work longer...
> > > >
> > > > --- a/drivers/net/sky2.c 2007-07-05 09:09:45.000000000 -0700
> > > > +++ b/drivers/net/sky2.c 2007-07-05 09:09:51.000000000 -0700
> > > > @@ -2490,6 +2490,13 @@ static int sky2_poll(struct net_device *
> > > >
> > > > work_done = sky2_status_intr(hw, work_limit);
> > > > if (work_done < work_limit) {
> > > > + /* Bug/Errata workaround?
> > > > + * Need to kick the TX irq moderation timer.
> > > > + */
> > > > + if (sky2_read8(hw, STAT_TX_TIMER_CTRL) == TIM_START) {
> > > > + sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_STOP);
> > > > + sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_START);
> > > > + }
> > > > netif_rx_complete(dev0);
> > > >
> > > > /* end of interrupt, re-enables also acts as I/O synchronization */
> > >
> > > I spoke too soon on this. With the above patch on 2.6.22-rc7, it
> > > failed much sooner than the previous patch with the
> > > read32(B0_Y2_SP_LISR); I'll try to reproduce with the older patch.
> > >
> > > Note the ifconfig error/dropped/frame count at the time of failure:
[snip]
> > The last message means some how frame was received with checksum for count
> > wrong. I have only seen it when coalescing is messed up.
> >
> > I ran for 2+ days with the patch, and only 20min without. Usually my ISP connection
> > gives up after that because of crappy DSL box, and that makes DNS not work.
>
> It wedged when I was copying a few GBs of data from my server to a
> local disk at the time, and running rsync over ssh on a large file on
> my server to my laptop's disk.
>
> This would be the typical load that would cause the NIC to lockup from
> missing an IRQ or otherwise, however, it did feel like the new code
> didn't un-wedge the Yukon-EC's bus master unit.
>
> What other tricks can be used to reset the Yukon-EC's bus master unit?
>
> I'll try the read32(B0_Y2_SP_LISR) trick, as before.
Nope, this still locks up as you found.
I have a reliable reproducer:
1. export directory over NFS TCP on server
2. mount directory on client
3. run 'iozone -a' in directory on client
I'm reproducing this with NFSv4 (with callbacks working) with 1500
octet MTU with one client, all gigabit. It would be good to hear if
you can reproduce the problem there.
Daniel
--
Daniel J Blueman
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists