[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20071004.184437.07576505.takano@axe-inc.co.jp>
Date: Thu, 04 Oct 2007 18:44:37 +0900 (JST)
From: TAKANO Ryousei <takano@...-inc.co.jp>
To: netdev@...r.kernel.org
Cc: y-kodama@...t.go.jp
Subject: [RFC][PATCH 2.6.23-rc9 0/2] detection of loss of retransmitted
packets
Hi all,
We found a performance problem which occurs in heavy packet loss
conditions. It seems there is a problem in detecting loss of
retransmitted packets.
In the retransmission queue, status of sent packets are registered.
When a packet is retransmitted, it is so marked, and snd_nxt (sequence
number of the next new (non-retransmission) packet to be sent) at that
moment is registered as ack_seq. A retransmitted packet is lost if it
is not SACKed, and its ack_seq is smaller than the sequence number of
any SACKed packet.
An ACK packet can have up to three SACK blocks. A SACK block has a
"start sequence number (start_seq)" and an "end sequence number
(end_seq)" of received packets. In the current implementation of
tcp_sacktag_write_queue(), if an ACK packet has multiple SACK blocks,
the SACK blocks are sorted by the start_seq in an ascending order, and
processed in the order. For scoreboarding packets in retransmission
queue, the queue is scanned from the the snd_una (the lowest sequence
number of not yet ACKed packets) to the end_seq of the SACK block. To
optimize the scanning process, the next SACK block is processed not
from the snd_una but from the end_seq of the previously processed SACK
block. In the current implementation, for detecting the loss of
retransmitted packets, the ack_seq of a retransmitted packet is
compared with the end_seq of each SACK block during the scoreboarding.
Therefore, a retransmitted packet which ack_seq is smaller than the
end_seq of the last SACK block but larger than that of the currently
being processed SACK block can not be detected as lost.
Such undetected loss may eventually cause an RTO and performance may
be degraded.
PATCH #1 fixes this problem by comparing the the ack_seq with the
largest end_seq of the SACK blocks.
In addition, some of SACK blocks in an ACK packet may be already
reported in preceding ACK packets. PATCH #2 optimizes processing by
skipping such already reported SACK blocks. Usually, only the first
SACK block of an ACK packet is the new one to be processed.
Therefore, in most cases, applying PATCH #2 also solves the problem.
However, to ensure accurate processing in case there are multiple
new SACK blocks in an ACK packet, PATCH #2 should be applied in
conjunction with PATCH #1.
The experimental network is as follows:
Node A ----> Router -------> Delay -------> Node B
(Policing rate: emulator
500Mbps) (RTT: 20ms)
You can find the detail of our experimental setting at
http://projects.gtrc.aist.go.jp/gnet/sack-bug.html
We transferred 1 GByte of data from Node A to Node B for ten times.
Here is the performance comparison of the cases with and without
these patches.
Ave. goodput Ave. RTO
2.6.22 376 Mbps 26
PATCH#1 481 Mbps 0
PATCH#2 483 Mbps 0
In the vanilla kernel, several RTOs (TCPTimeouts + TCPSackRecoveryFail)
occur. On the other hand, our patches eliminate RTOs and improve the
average goodput by 28%.
Any comments and ideas would be appreciated.
Regards,
Ryousei Takano
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists