[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45B8DBA5.9000500@hp.com>
Date: Thu, 25 Jan 2007 11:32:37 -0500
From: Vlad Yasevich <vladislav.yasevich@...com>
To: Steve Hill <steve.hill@...logic.com>
Cc: Sridhar Samudrala <sri@...ibm.com>, netdev@...r.kernel.org,
lksctp-developers@...ts.sourceforge.net
Subject: Re: [Lksctp-developers] Fw: Intermittent SCTP multihoming breakage
Hi Steve
Steve Hill wrote:
> On Wed, 10 Jan 2007, Sridhar Samudrala wrote:
>
>> So looks like there may be an issue with PR-SCTP(partial reliability)
>> support and packet loss. I will take a look into this.
>>
>> Do you still see this problem even if you don't set timetolive?
>
> No, the problem seems to go away if the timetolive is set to 0, so this is
> what I have now done since I had not intended to set the timetolive in the
> first place (but I thought it was still worth posting details of the
> problem since it does appear to be a bug).
>
I think I found this bug. It was rather interesting to figure out. The problem
appears to be that data messages time-out within the rto. As a result, they
move the abandoned list and are never retransmitted. This clears the retransmit
list and the retransmit timer, however the data is still charged as in-flight against
the association. This in turn causes new data not to be send, since we are 'supposedly'
utilizing our congestion window.
Can you try the attached patch and let me know if the problem is fixed. You can
try reducing rto_max or path_max_retrans to get the failover to happen a little faster.
Regards
-vlad
View attachment "0001-SCTP-Fix-connection-hang-slowdown-with-PR-SCTP.txt" of type "text/plain" (2805 bytes)
Powered by blists - more mailing lists