[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20070413.222941.123978678.davem@davemloft.net>
Date: Fri, 13 Apr 2007 22:29:41 -0700 (PDT)
From: David Miller <davem@...emloft.net>
To: torvalds@...ux-foundation.org
Cc: bunk@...sta.de, akpm@...ux-foundation.org, jgarzik@...ox.com,
netdev@...r.kernel.org, e1000-devel@...ts.sourceforge.net,
mingo@...e.hu, aabdulla@...dia.com, davej@...hat.com,
greg@...ah.com
Subject: Re: [1/3] 2.6.21-rc6: known regressions
From: Linus Torvalds <torvalds@...ux-foundation.org>
Date: Fri, 13 Apr 2007 18:34:23 -0700 (PDT)
Let's see how related these two might actually be.
> On Sat, 14 Apr 2007, Adrian Bunk wrote:
> >
> > Subject : laptops with e1000: lockups
> > References : https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229603
> > Submitter : Dave Jones <davej@...hat.com>
> > Handled-By : Jesse Brandeburg <jesse.brandeburg@...el.com>
> > Status : problem is being debugged
In this case the entire machine hangs and sometimes spits out an
NMI message.
The user confirms that using another network interface (albeit
wireless) works properly.
The Intel folks can reproduce this one in-house and will look more
deeply into it on Monday.
> > Subject : forcedeth: interface hangs under load
> > References : http://lkml.org/lkml/2007/4/3/39
> > Submitter : Ingo Molnar <mingo@...e.hu>
> > Handled-By : Ingo Molnar <mingo@...e.hu>
> > Ayaz Abdulla <aabdulla@...dia.com>
> > Status : problem is being debugged
In Ingo's case here the interface stops working entirely, but his
system is still otherwise operational.
I looked at the interrupt handler for this driver and it is absolutely
awful especially in the NAPI enabled case.
It tries to handle TX done interrupts and other status events in the
HW irq handler, and the RX packet processing via NAPI ->poll().
Time has shown that this is a faulty way to use NAPI and that all
events types should be done in the NAPI ->poll() handler, not just
RX packet processing.
The way the loop is coded now it will keep prodding at the interrupt
status register in the HW irq handler loop even after the RX packet
processing has been deferred to NAPI ->poll(). It seems likely that
since the RX packets aren't being processed there, the RX irq event
status should keep showing as set as new packets arrive.
Really, the interrupt status should be checked exactly once, all the
work deferred to NAPI's ->poll() and then the HW interrupt handler
should return immediately. This is what e1000 and tg3 do, and it is
therefore the most well tested manner in which to use NAPI in a
network driver.
Anything else is racey and error prone.
This would also eliminate the max_interrupt_work hack, it's a side
effect of the way the interrupt handler is implemented in this
driver.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists