netdev - Re: e100 problems in .23rc8 ?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20071011003638.GA27174@redhat.com>
Date:	Wed, 10 Oct 2007 20:36:38 -0400
From:	Dave Jones <davej@...hat.com>
To:	Herbert Xu <herbert@...dor.apana.org.au>
Cc:	"Kok, Auke" <auke-jan.h.kok@...el.com>, netdev@...r.kernel.org,
	esandeen@...hat.com, dmack@...iper.net
Subject: Re: e100 problems in .23rc8 ?

On Thu, Sep 27, 2007 at 02:58:27PM +0800, Herbert Xu wrote:
 > Kok, Auke <auke-jan.h.kok@...el.com> wrote:
 > > Dave Jones wrote:
 > >> Last night, I hit this bug during boot up..
 > >> http://www.codemonkey.org.uk/junk/e100-2.jpg
 > >> 
 > >> This morning, I got a mail from a Fedora user of the same
 > >> .23-rc8 based kernel that has seen a different trace
 > >> also implicating e100..
 > >> 
 > >> http://www.codemonkey.org.uk/junk/e100.jpg
 > >> 
 > >> It may be that the two problems are unrelated, and it's
 > >> just coincidence that both reports happen to be on an e100,
 > >> but the timing is odd.  Have there been other reports
 > >> of similar problems recently ?
 > > 
 > > there hasn't been a change to e100 in two months now - perhaps something slipped
 > > into the stack that broke it? If this reproduces, could you bisect?

So I looked into this some more, after it reared its head again.
The problem with bisecting it is that it doesn't happen on every
boot, so it's difficult to determine the good/bad state.
I've never managed to reproduce it on 2.6.22 however.

 > Well this looks exactly like the e1000 race that we fixed around
 > the time of the last kernel release.  That fix never made it into
 > e100 so it's no surprise that we get a similar crash here.

We're starting to see more reports of this from Fedora users
now that 2.6.23 is final.  Once we push that as an update
for Fedora 7 users, it's likely we'll see even more.
(likewise for the soon-to-be released F8, based on 2.6.23)

The e1000 changes you reference above, is this the changeset you mean?

commit 416b5d10afdc797c21c457ade3714e8f2f75edd9
Author: Auke Kok <auke-jan.h.kok@...el.com>
Date:   Fri Jun 1 10:22:39 2007 -0700

    e1000: disable polling before registering netdevice


 > The problem is that if a spurious interrupt comes in between
 > request_irq and netif_poll_enable then you'll get a crash at
 > the next netif_rx_complete.
 > 
 > It'd be good if this were reproducible as it would allow us
 > to identify the source of the spurious interrupt, which may
 > well be caused by an unrelated bug somewhere else.
 > 
 > In any case, e100 should be prepared to deal with spurious
 > interrupts as e1000 has been fixed to do.

Adding some of the other reporters of this bug to Cc,
in case they've found this more reproducable than myself
(maybe they'll have more luck bisecting).

	Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html