netdev - Re: [RFC PATCH net-next 0/5] tcp: TCP tracer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <5491EB7D.7020909@psc.edu>
Date:	Wed, 17 Dec 2014 15:45:49 -0500
From:	rapier <rapier@....edu>
To:	Yuchung Cheng <ycheng@...gle.com>, Blake Matheny <bmatheny@...com>
CC:	Eric Dumazet <eric.dumazet@...il.com>,
	Alexei Starovoitov <alexei.starovoitov@...il.com>,
	Laurent Chavey <chavey@...gle.com>, Martin Lau <kafai@...com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"David S. Miller" <davem@...emloft.net>,
	Hannes Frederic Sowa <hannes@...essinduktion.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Lawrence Brakmo <brakmo@...com>, Josef Bacik <jbacik@...com>,
	Kernel Team <Kernel-team@...com>
Subject: Re: [RFC PATCH net-next 0/5] tcp: TCP tracer

On 12/15/14 2:56 PM, Yuchung Cheng wrote:
> On Mon, Dec 15, 2014 at 8:08 AM, Blake Matheny <bmatheny@...com> wrote:
>>
>> We have an additional set of patches for web10g that builds on these
>> tracepoints. It can be made to work either way, but I agree the idea of
>> something like a sockopt would be really nice.
>
> I'd like to compare these patches  with tools that parse pcap files to
> generate per-flow counters to collect RTTs, #dupacks, etc. What
> additional values or insights do they provide to improve/debug TCP
> performance? maybe an example?

So this is our use scenario:

If the stack were instrumented on a per flow basis we can gather metrics 
proactively. This data can likely be processed in a near real time basis 
to at least get some general idea about the health of the flow (dupack, 
cong events, spurious rto, etc). It's possible we can use this data to 
provisionally flag flows during the lifespan of the transfer. If we 
store the collected metrics NOC engineers can access this to make a 
final determination about performance. They may then start the 
resolution process immediately using data collected in situ. With the 
web10g data we do collect stack data but we are also collecting 
information about the path and the interaction between the application 
and the stack.

This scenario is particularly appealing in the realm of big data 
science. We're currently working with datasets that are hundreds of TBs 
in size and will soon be dealing with multiple PBs as a matter of 
course. In many cases we're aware of the path characteristics in advance 
via SDN so we can apply the macroscopic model and see when we're 
dropping below thresholds for that path. Since we're doing most of 
transfers between loosely federated sets of distantly located transfer 
nodes we don't generally have access to the far end of the connection 
which might be the right place to collect the pcap data.

> IMO these stats provide a general pictures of how TCP works of a
> specific network, but not enough to really nail specific bugs in TCP
> protocol or implementation. Then SNMP stats or sampling with pcap
> traces with offline analysis can achieve the same purpose.

I'd agree with that but in the scenario we are most interested in 
protocol/implementation issues are secondary concerns. They are 
important but we've mostly be focused on what we can do to make the 
scientific workflow easier when dealing with the transfer of large data 
sets.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html