netdev - Re: TCP event tracking via netlink...

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 6 Dec 2007 09:23:12 -0800
From:	Stephen Hemminger <shemminger@...ux-foundation.org>
To:	David Miller <davem@...emloft.net>
Cc:	ilpo.jarvinen@...sinki.fi, netdev@...r.kernel.org
Subject: Re: TCP event tracking via netlink...

On Thu, 06 Dec 2007 02:33:46 -0800 (PST)
David Miller <davem@...emloft.net> wrote:

> From: "Ilpo_Järvinen" <ilpo.jarvinen@...sinki.fi>
> Date: Thu, 6 Dec 2007 01:18:28 +0200 (EET)
> 
> > On Wed, 5 Dec 2007, David Miller wrote:
> > 
> > > I assume you're using something like carefully crafted printk's,
> > > kprobes, or even ad-hoc statistic counters.  That's what I used to do
> > > :-)
> > 
> > No, that's not at all what I do :-). I usually look time-seq graphs 
> > expect for the cases when I just find things out by reading code (or
> > by just thinking of it).
> 
> Can you briefly detail what graph tools and command lines
> you are using?
> 
> The last time I did graphing to analyze things, the tools
> were hit-or-miss.
> 
> > Much of the info is available in tcpdump already, it's just hard to read 
> > without graphing it first because there are some many overlapping things 
> > to track in two-dimensional space.
> > 
> > ...But yes, I have to admit that couple of problems come to my mind
> > where having some variable from tcp_sock would have made the problem
> > more obvious.
> 
> The most important are the cwnd and ssthresh, which you could guess
> using graphs but it is important to know on a packet to packet
> basis why we might have sent a packet or not because this has
> rippling effects down the rest of the RTT.
> 
> > Not sure what is the benefit of having distributions with it because 
> > those people hardly report problems anyway to here, they're just too 
> > happy with TCP performance unless we print something to their logs,
> > which implies that we must setup a *_ON() condition :-(.
> 
> That may be true, but if we could integrate the information with
> tcpdumps, we could gather internal state using tools the user
> already has available.
> 
> Imagine if tcpdump printed out:
> 
> 02:26:14.865805 IP $SRC > $DEST: . 11226:12686(1460) ack 0 win 108
> 	ss_thresh: 129 cwnd: 133 packets_out: 132
> 
> or something like that.
> 
> > Some problems are simply such that things cannot be accurately verified 
> > without high processing overhead until it's far too late (eg skb bits vs 
> > *_out counters). Maybe we should start to build an expensive state 
> > validator as well which would automatically check invariants of the write 
> > queue and tcp_sock in a straight forward, unoptimized manner? That would 
> > definately do a lot of work for us, just ask people to turn it on and it 
> > spits out everything that went wrong :-) (unless they really depend on 
> > very high-speed things and are therefore unhappy if we scan thousands of 
> > packets unnecessarily per ACK :-)). ...Early enough! ...That would work 
> > also for distros but there's always human judgement needed to decide 
> > whether the bug reporter will be happy when his TCP processing does no 
> > longer scale ;-).
> 
> I think it's useful as a TCP_DEBUG config option or similar, sure.
> 
> But sometimes the algorithms are working as designed, it's just that
> they provide poor pipe utilization and CWND analysis embedded inside
> of a tcpdump would be one way to see that as well as determine the
> flaw in the algorithm.
> 
> > ...Hopefully you found any of my comments useful.
> 
> Very much so, thanks.
> 
> I put together a sample implementation anyways just to show the idea,
> against net-2.6.25 below.
> 
> It is untested since I didn't write the userland app yet to see that
> proper things get logged.  Basically you could run a daemon that
> writes per-connection traces into files based upon the incoming
> netlink events.  Later, using the binary pcap file and these traces,
> you can piece together traces like the above using the timestamps
> etc. to match up pcap packets to ones from the TCP logger.
> 
> The userland tools could do analysis and print pre-cooked state diff
> logs, like "this ACK raised CWND by one" or whatever else you wanted
> to know.
> 
> It's nice that an expert like you can look at graphs and understand,
> but we'd like to create more experts and besides reading code one
> way to become an expert is to be able to extrace live real data
> from the kernel's working state and try to understand how things
> got that way.  This information is permanently lost currently.


Tools and scripts for testing that generate graphs are at:
	git://git.kernel.org/pub/scm/tcptest/tcptest
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html