[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0805261956090.24924@wrl-59.cs.helsinki.fi>
Date: Mon, 26 May 2008 20:08:37 +0300 (EEST)
From: "Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To: Ingo Molnar <mingo@...e.hu>
cc: LKML <linux-kernel@...r.kernel.org>,
Netdev <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
"Rafael J. Wysocki" <rjw@...k.pl>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
On Mon, 26 May 2008, Ingo Molnar wrote:
>
> * Ilpo Järvinen <ilpo.jarvinen@...sinki.fi> wrote:
>
> > > > Hmm, readfds is NULL isn't it?!? Are you sure you straced the
> > > > right process?
> > >
> > > yes, i'm stracing the task that is hung unexpectedly.
> >
> > But that wasn't the receiving process? (I didn't quickly find into
> > which direction distcc ports go, so I couldn't confirm this). If you
> > still have that situation at hand, could you check which is the
> > receiving process (e.g., using netstat -p, the end which has Recv-Q is
> > the right one) and where it's stuck?
>
> it wasnt the receiving process. There's no receiving process - which is
> weird:
>
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
> tcp 0 207232 europe:37198 europe:distcc ESTABLISHED 19578/distcc
> tcp 0 0 europe:ssh dione:36284 ESTABLISHED -
> tcp 0 0 europe:ssh e2:45910 ESTABLISHED -
> tcp 72283 0 europe:distcc europe:37198 ESTABLISHED -
Just to be sure (please forgive me if you find this nearly an insult :-)),
did you have enough rights to find out the pid (ie., if that process not
owned by you then you need superuser privs for that)?
> i just gave it as a general example of why sometimes stracing a task can
> 'disturb' the observed system and can kick the TCP state machine out of
> a stall. I did not say it's occuring here.
Yeah, I understood that earlier. Similarly, I just wanted to point out
the end where the problem lies :-).
> > ...It may still be that the receiving process is stuck due to the
> > non-net related changes you have there.
>
> the socket does not seem to be owned. It should have closed down?
> Refcounting issue?
It's well possible that e.g., net namespaces have some bug in handling
of orphaned tcp.
> find below the sysrq-t dump.
...I'll have a look into that as well (though with such I'm on a more
unfamiliar territory, so it will take a moment).
--
i.
Powered by blists - more mailing lists