[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080617092706.GB20621@elte.hu>
Date: Tue, 17 Jun 2008 11:27:06 +0200
From: Ingo Molnar <mingo@...e.hu>
To: David Miller <davem@...emloft.net>
Cc: kuznet@....inr.ac.ru, vgusev@...nvz.org, mcmanus@...ksong.com,
xemul@...nvz.org, netdev@...r.kernel.org,
ilpo.jarvinen@...sinki.fi, linux-kernel@...r.kernel.org,
e1000-devel@...ts.sourceforge.net, rjw@...k.pl
Subject: Re: [TCP]: TCP_DEFER_ACCEPT causes leak sockets
* David Miller <davem@...emloft.net> wrote:
> From: Ingo Molnar <mingo@...e.hu>
> Date: Tue, 17 Jun 2008 10:32:20 +0200
>
> > those up to 1000 msec delays can be 'felt' via ssh too, if this
> > problem triggers then the system is almost unusable via the network.
> > Local latencies are perfect so it's an e1000 problem.
>
> Or some kind of weird interrupt problem.
>
> Such an interrupt level bug would also account for the TX timeout's
> you're seeing btw.
when i originally reported it i debugged it back to missing e1000 TX
completion IRQs. I tried various versions of the driver to figure out
whether new workarounds for e1000 cover it but it was fruitless. There
is a 1000 msec internal watchdog timer IRQ within e1000 that gets things
going if it's stuck.
But the line sch_generic.c:222 problem is new. It could be an
escallation of this same problem - not even the hw-internal watchdog
timeout fixing up things? So basically two levels of completion failed,
the third fallback level (a hard reset of the interface) helped things
get going. High score from me for networking layer robustness :-)
Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists