netdev - RE: [PATCH net] net: sctp: fix multihoming retransmission path selection to rfc4960

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <063D6719AE5E284EB5DD2968C1650D6D0F6C762B@AcuExch.aculab.com>
Date:	Thu, 20 Feb 2014 12:25:21 +0000
From:	David Laight <David.Laight@...LAB.COM>
To:	'Daniel Borkmann' <dborkman@...hat.com>,
	"davem@...emloft.net" <davem@...emloft.net>
CC:	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"linux-sctp@...r.kernel.org" <linux-sctp@...r.kernel.org>,
	Gui Jianfeng <guijianfeng@...fujitsu.com>
Subject: RE: [PATCH net] net: sctp: fix multihoming retransmission path
 selection to rfc4960

From: Daniel Borkmann
> 
> Problem statement: 1) both paths (primary path1 and alternate
> path2) are up after the association has been established i.e.,
> HB packets are normally exchanged, 2) path2 gets inactive after
> path_max_retrans * max_rto timed out (i.e. path2 is down completely),
> 3) now, if a transmission times out on the only surviving/active
> path1 (any ~1sec network service impact could cause this like
> a channel bonding failover), then the retransmitted packets are
> sent over the inactive path2; this happens with partial failover
> and without it.
> 
> Besides not being optimal in the above scenario, a small failure
> or timeout in the only existing path has the potential to cause
> long delays in the retransmission (depending on RTO_MAX) until
> the still active path is reselected.

The current behaviour doesn't seem very good - real networks tend
to have non-zero packet loss these days (for all sorts of reasons).

I guess that under moderate traffic flow retransmit requests from
the remote system recover the data before a timeout actually occurs.

That probably means that a path with a high error rate will continue
to be used when an alternate path would be much better.

I was wondering whether it is valid (or even reasonable) to send
the retransmit down multiple paths?  Particularly if they are
not known to be working.
Or maybe resend heartbeats in a desperate attempt to find a working
path?

Do you guys know which kernel version(s) have that patch?
We have a few customers using sctp (for m3ua) and I really ought
to keep track of the 'good' and 'bad' kernel versions.

	David

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html