lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20070413.222941.123978678.davem@davemloft.net>
Date:	Fri, 13 Apr 2007 22:29:41 -0700 (PDT)
From:	David Miller <davem@...emloft.net>
To:	torvalds@...ux-foundation.org
Cc:	bunk@...sta.de, akpm@...ux-foundation.org, jgarzik@...ox.com,
	netdev@...r.kernel.org, e1000-devel@...ts.sourceforge.net,
	mingo@...e.hu, aabdulla@...dia.com, davej@...hat.com,
	greg@...ah.com
Subject: Re: [1/3] 2.6.21-rc6: known regressions

From: Linus Torvalds <torvalds@...ux-foundation.org>
Date: Fri, 13 Apr 2007 18:34:23 -0700 (PDT)

Let's see how related these two might actually be.

> On Sat, 14 Apr 2007, Adrian Bunk wrote:
> > 
> > Subject    : laptops with e1000: lockups
> > References : https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229603
> > Submitter  : Dave Jones <davej@...hat.com>
> > Handled-By : Jesse Brandeburg <jesse.brandeburg@...el.com>
> > Status     : problem is being debugged

In this case the entire machine hangs and sometimes spits out an
NMI message.

The user confirms that using another network interface (albeit
wireless) works properly.

The Intel folks can reproduce this one in-house and will look more
deeply into it on Monday.

> > Subject    : forcedeth: interface hangs under load
> > References : http://lkml.org/lkml/2007/4/3/39
> > Submitter  : Ingo Molnar <mingo@...e.hu>
> > Handled-By : Ingo Molnar <mingo@...e.hu>
> >              Ayaz Abdulla <aabdulla@...dia.com>
> > Status     : problem is being debugged

In Ingo's case here the interface stops working entirely, but his
system is still otherwise operational.

I looked at the interrupt handler for this driver and it is absolutely
awful especially in the NAPI enabled case.

It tries to handle TX done interrupts and other status events in the
HW irq handler, and the RX packet processing via NAPI ->poll().

Time has shown that this is a faulty way to use NAPI and that all
events types should be done in the NAPI ->poll() handler, not just
RX packet processing.

The way the loop is coded now it will keep prodding at the interrupt
status register in the HW irq handler loop even after the RX packet
processing has been deferred to NAPI ->poll().  It seems likely that
since the RX packets aren't being processed there, the RX irq event
status should keep showing as set as new packets arrive.

Really, the interrupt status should be checked exactly once, all the
work deferred to NAPI's ->poll() and then the HW interrupt handler
should return immediately.  This is what e1000 and tg3 do, and it is
therefore the most well tested manner in which to use NAPI in a
network driver.

Anything else is racey and error prone.

This would also eliminate the max_interrupt_work hack, it's a side
effect of the way the interrupt handler is implemented in this
driver.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ