[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140220124002.GB11199@hmsreliant.think-freely.org>
Date: Thu, 20 Feb 2014 07:40:02 -0500
From: Neil Horman <nhorman@...driver.com>
To: David Laight <David.Laight@...LAB.COM>
Cc: 'Daniel Borkmann' <dborkman@...hat.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-sctp@...r.kernel.org" <linux-sctp@...r.kernel.org>,
Gui Jianfeng <guijianfeng@...fujitsu.com>
Subject: Re: [PATCH net] net: sctp: fix multihoming retransmission path
selection to rfc4960
On Thu, Feb 20, 2014 at 12:25:21PM +0000, David Laight wrote:
> From: Daniel Borkmann
> >
> > Problem statement: 1) both paths (primary path1 and alternate
> > path2) are up after the association has been established i.e.,
> > HB packets are normally exchanged, 2) path2 gets inactive after
> > path_max_retrans * max_rto timed out (i.e. path2 is down completely),
> > 3) now, if a transmission times out on the only surviving/active
> > path1 (any ~1sec network service impact could cause this like
> > a channel bonding failover), then the retransmitted packets are
> > sent over the inactive path2; this happens with partial failover
> > and without it.
> >
> > Besides not being optimal in the above scenario, a small failure
> > or timeout in the only existing path has the potential to cause
> > long delays in the retransmission (depending on RTO_MAX) until
> > the still active path is reselected.
>
> The current behaviour doesn't seem very good - real networks tend
> to have non-zero packet loss these days (for all sorts of reasons).
>
> I guess that under moderate traffic flow retransmit requests from
> the remote system recover the data before a timeout actually occurs.
>
> That probably means that a path with a high error rate will continue
> to be used when an alternate path would be much better.
>
Not really sure what you mean here. Why would we use a path with a high error
rate when another one would be much better. If we get to many retransmits on
the current active path, we select a different one, attempting to use collected
metrics to determine which path would be the most prefereable.
> I was wondering whether it is valid (or even reasonable) to send
> the retransmit down multiple paths? Particularly if they are
> not known to be working.
Yes, quick failover defines that behavior:
http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05
And if its not appropriate for your network, you can disable it via sysctl.
> Or maybe resend heartbeats in a desperate attempt to find a working
> path?
>
> Do you guys know which kernel version(s) have that patch?
Which patch, what daniel describes above has been the behavior for some time
IIRC.
> We have a few customers using sctp (for m3ua) and I really ought
> to keep track of the 'good' and 'bad' kernel versions.
>
> David
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists