[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090401113520.GA2667@gentoox2.trippelsdorf.de>
Date: Wed, 1 Apr 2009 13:35:20 +0200
From: Markus Trippelsdorf <markus@...ppelsdorf.de>
To: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
Cc: Netdev <netdev@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
David Miller <davem@...emloft.net>
Subject: Re: WARNING: at net/ipv4/tcp_input.c:2927 tcp_ack+0xd55/0x1991()
On Wed, Apr 01, 2009 at 02:09:11PM +0300, Ilpo Järvinen wrote:
> On Tue, 31 Mar 2009, Markus Trippelsdorf wrote:
>
> > On Tue, Mar 31, 2009 at 12:16:51PM +0300, Ilpo Järvinen wrote:
> > > On Tue, 31 Mar 2009, Markus Trippelsdorf wrote:
> > >
> > > > On Mon, Mar 30, 2009 at 09:52:55PM +0300, Ilpo Järvinen wrote:
> > > > > On Mon, 30 Mar 2009, Markus Trippelsdorf wrote:
> > > > >
> > > > > > On Mon, Mar 30, 2009 at 07:01:22PM +0300, Ilpo Järvinen wrote:
> > > > > > > On Sat, 28 Mar 2009, Markus Trippelsdorf wrote:
> > > > > > > > On Sat, Mar 28, 2009 at 10:29:58AM +0200, Ilpo Järvinen wrote:
> > > > > > >
> > > > > > > ...And, let me guess, you're in X and therefore unable to catch a final
> > > > > > > oops if any would be printed? It would be nice to get around that as well,
> > > > > > > either use serial/netconsole or hang in text mode while waiting for the
> > > > > > > crash (should be too hard if you are able to setup the workload first
> > > > > > > and then switch away from X and if reproducing takes about an hour)...
> > > > > >
> > > > > > OK, I will try this later.
> > > > >
> > > > > Lets hope that gives some clue where it ends up going boom (if it is
> > > > > caused by TCP we certainly should see something more sensible in console
> > > > > than just a hang)... ...I once again read through tcp commits but just
> > > > > cannot find anything that could cause fackets_out miscount, not to speak
> > > > > of crash prone changes so we'll just have to wait and see...
> > > >
> > > > The machine hanged again this night and I took two pictures:
> > > > http://www.mypicx.com/uploadimg/1055813374_03302009_2.jpg
> > > > http://www.mypicx.com/uploadimg/1543678904_03302009_1.jpg
> > > >
> > > > But this time there was no tcp related warning in the logs.
> > >
> > > Right. If that oops would be hit often enough one can easily mix the
> > > warning with that hang though there is no relation (the fact that final
> > > oops always goes unnoticed in X amplifies the effect).
> > >
> > > > I then pulled the lateset git changes, rebuild, rebooted and setup the
> > > > workload again. The machine was still up and running in the morning
> > > > (~4 hours uptime). So it may well be that the issue was fixed with
> > > > the latest changes.
> > >
> > > Lets hope so, in any case if you still see hangs please get the final oops.
> > >
> > > > If it ever occurs again I will notify you.
> >
> > It happend again. In this case it took ~10 minutes from the warning to
> > the final crash. I'm pretty sure there must be some kind of relation
> > between the two. How else could one explain that the machine crashes just
> > minutes after _each_ instance of that WARNING?
>
> Here's my try #1... It should silence the warning (we would have seen
> a later warning in the console btw and finally an oops due to NULL
> dereference would you have been able to capture it) and hopefully doesn't
> introduce any other problem of any kind. So far I did only compile
> test it.
Many thanks for the quick fix. I will try it here ASAP.
(Hopefully modesetting support for Radeon cards will be ready shortly,
so that I could capture oopses more easily...)
--
Markus
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists