[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080721140128.GA32245@elte.hu>
Date: Mon, 21 Jul 2008 16:01:28 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Evgeniy Polyakov <johnpol@....mipt.ru>
Cc: Pekka Enberg <penberg@...helsinki.fi>,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
Vegard Nossum <vegard.nossum@...il.com>,
"Rafael J. Wysocki" <rjw@...k.pl>, cl@...ux-foundation.org,
davem@...emloft.net
Subject: Re: [bug, netconsole, SLUB] BUG skbuff_head_cache: Poison
overwritten
* Evgeniy Polyakov <johnpol@....mipt.ru> wrote:
> On Mon, Jul 21, 2008 at 01:55:55PM +0200, Ingo Molnar (mingo@...e.hu) wrote:
> > > > I could try run tests with netconsole deactivated, if you think
> > > > that's a worthwile line of probing this problem. (although that
> > > > would make me do blind tests in essence - having kernel log output
> > > > is really essential.)
> > >
> > > Let's try this way first. If system will continue to crash, we will
> > > add some debug options in various pathes. Existing reports do not
> > > contain enough information unfortunately, so we will not lose too
> > > much.
> >
> > ok. I've turned off netconsole - 8 successful bootups in a row so far.
> > The box is a slow booter/builder with an 8 kernels/hour test throughput,
> > so if everything goes fine we should have meaningful results in about 10
> > hours.
> >
> > ( there are other, faster testboxes in -tip testing with 33 kernels/hour
> > build+boot throughput where we'd have to wait only 2 hours - but as
> > per Murphy's law they dont trigger this bug ;-)
>
> Since 2.6.25 there was only single change in netpoll.c:
> f5184d267c1aedb9b7a8cc44e08ff6b8d382c3b5
> Which looks innocent.
>
> Is your driver e1000 or e1000e? Can you check different one?
i cannot check e1000 anymore due to this upstream commit:
| d03157babed7424f5391af43200593768ce69c9a is first bad commit
| commit d03157babed7424f5391af43200593768ce69c9a
| Author: Auke Kok <auke-jan.h.kok@...el.com>
| Date: Sun Jun 22 15:21:29 2008 -0700
|
| e1000: remove PCI Express device IDs
|
| We do not want to prolong the situation much longer that e1000
| and e1000e support these devices at the same time. As a result,
| take out the bandage that was added for the interim period
| and remove all the PCI Express device IDs from e1000.
but yes, this box was using e1000 for a long time, and recently migrated
to e1000e. I'm not sure there's any connection, do you think there is?
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists