[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0704262121120.18515@kivilampi-30.cs.helsinki.fi>
Date: Thu, 26 Apr 2007 22:18:47 +0300 (EEST)
From: "Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To: Rick Jones <rick.jones2@...com>
cc: Netdev <netdev@...r.kernel.org>
Subject: Re: Unexcepted latency (order of 100-200 ms) with TCP (packet receive)
On Thu, 26 Apr 2007, Rick Jones wrote:
> Ilpo Järvinen wrote:
> >
> > ...
> > Some time ago I noticed that with 2.6.18 I occassionally get latency
> > spikes as long as 100-200ms in the TCP transfers between components
> > (I describe later how TCP was tuned during these tests to avoid
> > problems that occur with small segments). I started to investigate
> > the spikes, and here are the findings so far:
> > ...
> > - I placed a hub to get exact timings on the wire without potential
> > interference from tcpdump on the emulator host (test done with 2.6.18)
> > but to my great surprise, the problem vanished completely
>
> Sounds like tcpdump getting in the way?
No, tcpdump is not running on the host in question at all (that is what
I'm trying to say but my English might betray me). I just describe there
that I made a testing configuration when (after) I noticed that this
100-200ms problem exists without any tcpdumps. With hub, the network was
configured like this (I tried to avoid additional linux sender):
packet capture host ---> switch ---> hub ---> emulation host
|
|
\ /
host with tcpdump
(for the record: due to the emulator nature, the sender in this case
is called "packet capture host" even though I don't mean that I capture
these tcpdumps at it.)
Sadle enough, the problem disappeared with it (as the hub caused eth to
become 10mbps network...). Because of that, the tcpdump I'm showing below
is captured with this route though another host configuration (additional
sender is there which I tried to completely avoid by using the above hub
setup):
host with tcpdump, ip_forward = 1
(10Mbps eth on purpose)
/ \ |
| |
| \ /
packet capture host ----> switch -----> emulation host
...the reason being that tcpdump on the emulation host could get into the
way like you're saying. I run it once there too, and it saw the packet
delayed by the 100-130ms which probably is valid occurance of this same
problem (too large to be due to scheduling or so, IMHO) that occurs
without either of the tcpdump configs presented here, when traffic just
goes directly from packet capture host through the switch to emulation
host.
> How many CPUs do you have in the system, and have you tried some
> explicit binding of processes to different CPUs? (taskset etc...)
One p4. What processes do you mean, I've carefully disabled most of the
services (please read the explination in the end of the original mail),
and the s/utimes of the remaining (from proc/<pid>/stat) processes on the
emulation host do not increase during the test... Is it still possible
that they could have been scheduled but did not get s/utime incremented,
I'm not qualified to say that but perhaps you or somebody else knows if
it's so?
Besides, I've the very same configuration (no problem => without tcpdump)
working perfectly with 2.4 and also with 2.6 when the emulation host eth
is set to 10Mbps instead of 100Mbps using ethtool. Could 2.6 with 100Mbps
eth really cause 100ms latency spikes, especially, considering that my
emulated wireless link is just 384kbps, so the amount of data that is
going past ain't much larger than that...
> When running tcpdump are you simply sending raw traces to a file, or are you
> having the ASCII redirected to a file?
> What about name resolution (ie areyou using -n)?
I used this command:
tcpdump -i eth0 -w tracefile host 192.168.1.3 or host 192.168.1.2
All hosts in the network should be in /etc/hosts, but I can try with -n
tomorrow but I doubt that it will make any difference because the problem
is there regardless of the tcpdump. The very reason of the tcpdump being
there is I just to debug the problem between the emulation host and the
sender to see whether the problem is in the sender end or in the receiving
end.
> > To prove this with 100Mbps, I setup routing so that with a host with 10/FD
> > configuration (known, based on earlier, to be unlikely to cause errors) I
> > collected all traffic between the emulator host and one of the packet
> > capture hosts. Here is one example point where a long delay occurs (EMU is
> > the emulator host, in the real log each packet is shown twice, I only leave
> > the later one here):
> >
> > 1177577267.364596 IP CAP.35305 > EMU.52246: . 17231434:17232894(1460) ack
> > 383357 win 16293
> > 1177577267.364688 IP CAP.35305 > EMU.52246: P 17232894:17232946(52) ack
> > 383357 win 16293
> > 1177577267.366093 IP EMU.52246 > CAP.35305: . ack 17232894 win 32718
> > 1177577267.493815 IP EMU.52246 > CAP.35305: P 383357:383379(22) ack 17232894
> > win 32718
At this point the segment with seq 17232946 hasn't yet reached TCP (to
be more precise, when this was sent) because cumulatively acks only to
17232894! That's 129 ms later than the segment with seq 17232946 was seen
by the tcpdump host (a quite long time)...
> > 1177577267.534252 IP CAP.35305 > EMU.52246: . ack 383379 win 16293
>
> What is the length of the standalone ACK timer these days?
Do you mean so called delayed ack timer?
kernels/linux/include$ grep "DELACK" */*
net/tcp.h:#define TCP_DELACK_MAX ((unsigned)(HZ/5)) /* maximal
time to delay before sending an ACK */
net/tcp.h:#define TCP_DELACK_MIN ((unsigned)(HZ/25)) /* minimal
time to delay before sending an ACK */
net/tcp.h:#define TCP_DELACK_MIN 4U
I suspect that it could be so that delayed ack is related in way as it
is the thing that wakes up the adapter first time after the delay (when
nothing else from the network arrives before the delack timer expires).
However, I'm quite sure this problem might be below TCP level because of
tcpdump cumulative ACK seq in segment 1177577267.493815 which does not yet
include the latest segment thatshould have already arrived (tcpdump saw it
forwarded 129ms earlier).
> Might tcp_abc have crept back-in?
# cat /proc/sys/net/ipv4/tcp_abc
0
Thanks anyway for the ideas.
--
i.
Powered by blists - more mailing lists