lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 22 Jan 2014 13:41:01 +0000
From:	David Laight <David.Laight@...LAB.COM>
To:	'Matija Glavinic Pecotic' <matija.glavinic-pecotic.ext@....com>,
	"linux-sctp@...r.kernel.org" <linux-sctp@...r.kernel.org>
CC:	Alexander Sverdlin <alexander.sverdlin@....com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [PATCH] net: sctp: Fix a_rwnd/rwnd management to reflect real
 state of the receiver's buffer

From: Matija Glavinic Pecotic
> Implementation of (a)rwnd calculation might lead to severe performance issues
> and associations completely stalling. These problems are described and solution
> is proposed which improves lksctp's robustness in congestion state.
> 
> 1) Sudden drop of a_rwnd and incomplete window recovery afterwards
> 
> Data accounted in sctp_assoc_rwnd_decrease takes only payload size (sctp data),
> but size of sk_buff, which is blamed against receiver buffer, is not accounted
> in rwnd. Theoretically, this should not be the problem as actual size of buffer
> is double the amount requested on the socket (SO_RECVBUF). Problem here is
> that this will have bad scaling for data which is less then sizeof sk_buff.
> E.g. in 4G (LTE) networks, link interfacing radio side will have a large portion
> of traffic of this size (less then 100B).
...
> 
> Proposed solution:
> 
> Both problems share the same root cause, and that is improper scaling of socket
> buffer with rwnd. Solution in which sizeof(sk_buff) is taken into concern while
> calculating rwnd is not possible due to fact that there is no linear
> relationship between amount of data blamed in increase/decrease with IP packet
> in which payload arrived. Even in case such solution would be followed,
> complexity of the code would increase. Due to nature of current rwnd handling,
> slow increase (in sctp_assoc_rwnd_increase) of rwnd after pressure state is
> entered is rationale, but it gives false representation to the sender of current
> buffer space. Furthermore, it implements additional congestion control mechanism
> which is defined on implementation, and not on standard basis.
> 
> Proposed solution simplifies whole algorithm having on mind definition from rfc:
> 
> o  Receiver Window (rwnd): This gives the sender an indication of the space
>    available in the receiver's inbound buffer.
> 
> Core of the proposed solution is given with these lines:
> 
> sctp_assoc_rwnd_account:
> 	if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0)
> 		asoc->rwnd = (asoc->base.sk->sk_rcvbuf - rx_count) >> 1;
> 	else
> 		asoc->rwnd = 0;
> 
> We advertise to sender (half of) actual space we have. Half is in the braces
> depending whether you would like to observe size of socket buffer as SO_RECVBUF
> or twice the amount, i.e. size is the one visible from userspace, that is,
> from kernelspace.
> In this way sender is given with good approximation of our buffer space,
> regardless of the buffer policy - we always advertise what we have. Proposed
> solution fixes described problems and removes necessity for rwnd restoration
> algorithm. Finally, as proposed solution is simplification, some lines of code,
> along with some bytes in struct sctp_association are saved.

IIRC the 'size' taken of the socket buffer is the skb's 'true size' and that
includes any padding before and after the actual rx data. For short packets
the driver may have copied the data into a smaller skb, for long packets it
is likely to be more than that of a full length ethernet packet.
In either case it can be significantly more than sizeof(sk_buff) (190?) plus
the size of the actual data.

I'm also not sure that continuously removing 'credit' is a good idea.
I've done a lot of comms protocol code, removing credit and 'window
slamming' acks are not good ideas.

Perhaps the advertised window should be bounded by the configured socket
buffer size, and only reduced if the actual space isn't likely to be large
enough given the typical overhead of the received data.

Similarly, as the window is opened after congestion it should be increased
by the amount of data actually removed (not the number of free bytes).
When there is a significant amount of space the window could be increased
faster - allowing a smaller number of larger skb carrying more data be queued.

As a matter of interest, how does TCP handle this?

	David

Powered by blists - more mailing lists