[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20071206092312.79a83208@freepuppy.rosehill>
Date: Thu, 6 Dec 2007 09:23:12 -0800
From: Stephen Hemminger <shemminger@...ux-foundation.org>
To: David Miller <davem@...emloft.net>
Cc: ilpo.jarvinen@...sinki.fi, netdev@...r.kernel.org
Subject: Re: TCP event tracking via netlink...
On Thu, 06 Dec 2007 02:33:46 -0800 (PST)
David Miller <davem@...emloft.net> wrote:
> From: "Ilpo_Järvinen" <ilpo.jarvinen@...sinki.fi>
> Date: Thu, 6 Dec 2007 01:18:28 +0200 (EET)
>
> > On Wed, 5 Dec 2007, David Miller wrote:
> >
> > > I assume you're using something like carefully crafted printk's,
> > > kprobes, or even ad-hoc statistic counters. That's what I used to do
> > > :-)
> >
> > No, that's not at all what I do :-). I usually look time-seq graphs
> > expect for the cases when I just find things out by reading code (or
> > by just thinking of it).
>
> Can you briefly detail what graph tools and command lines
> you are using?
>
> The last time I did graphing to analyze things, the tools
> were hit-or-miss.
>
> > Much of the info is available in tcpdump already, it's just hard to read
> > without graphing it first because there are some many overlapping things
> > to track in two-dimensional space.
> >
> > ...But yes, I have to admit that couple of problems come to my mind
> > where having some variable from tcp_sock would have made the problem
> > more obvious.
>
> The most important are the cwnd and ssthresh, which you could guess
> using graphs but it is important to know on a packet to packet
> basis why we might have sent a packet or not because this has
> rippling effects down the rest of the RTT.
>
> > Not sure what is the benefit of having distributions with it because
> > those people hardly report problems anyway to here, they're just too
> > happy with TCP performance unless we print something to their logs,
> > which implies that we must setup a *_ON() condition :-(.
>
> That may be true, but if we could integrate the information with
> tcpdumps, we could gather internal state using tools the user
> already has available.
>
> Imagine if tcpdump printed out:
>
> 02:26:14.865805 IP $SRC > $DEST: . 11226:12686(1460) ack 0 win 108
> ss_thresh: 129 cwnd: 133 packets_out: 132
>
> or something like that.
>
> > Some problems are simply such that things cannot be accurately verified
> > without high processing overhead until it's far too late (eg skb bits vs
> > *_out counters). Maybe we should start to build an expensive state
> > validator as well which would automatically check invariants of the write
> > queue and tcp_sock in a straight forward, unoptimized manner? That would
> > definately do a lot of work for us, just ask people to turn it on and it
> > spits out everything that went wrong :-) (unless they really depend on
> > very high-speed things and are therefore unhappy if we scan thousands of
> > packets unnecessarily per ACK :-)). ...Early enough! ...That would work
> > also for distros but there's always human judgement needed to decide
> > whether the bug reporter will be happy when his TCP processing does no
> > longer scale ;-).
>
> I think it's useful as a TCP_DEBUG config option or similar, sure.
>
> But sometimes the algorithms are working as designed, it's just that
> they provide poor pipe utilization and CWND analysis embedded inside
> of a tcpdump would be one way to see that as well as determine the
> flaw in the algorithm.
>
> > ...Hopefully you found any of my comments useful.
>
> Very much so, thanks.
>
> I put together a sample implementation anyways just to show the idea,
> against net-2.6.25 below.
>
> It is untested since I didn't write the userland app yet to see that
> proper things get logged. Basically you could run a daemon that
> writes per-connection traces into files based upon the incoming
> netlink events. Later, using the binary pcap file and these traces,
> you can piece together traces like the above using the timestamps
> etc. to match up pcap packets to ones from the TCP logger.
>
> The userland tools could do analysis and print pre-cooked state diff
> logs, like "this ACK raised CWND by one" or whatever else you wanted
> to know.
>
> It's nice that an expert like you can look at graphs and understand,
> but we'd like to create more experts and besides reading code one
> way to become an expert is to be able to extrace live real data
> from the kernel's working state and try to understand how things
> got that way. This information is permanently lost currently.
Tools and scripts for testing that generate graphs are at:
git://git.kernel.org/pub/scm/tcptest/tcptest
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists