netdev - Re: [PATCH net] Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <534ED0FD.4040709@gmail.com>
Date:	Wed, 16 Apr 2014 14:50:37 -0400
From:	Vlad Yasevich <vyasevich@...il.com>
To:	Alexander Sverdlin <alexander.sverdlin@....com>,
	ext Dongsheng Song <dongsheng.song@...il.com>,
	Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@....com>
CC:	Daniel Borkmann <dborkman@...hat.com>, davem@...emloft.net,
	netdev@...r.kernel.org,
	"linux-sctp@...r.kernel.org" <linux-sctp@...r.kernel.org>
Subject: Re: [PATCH net] Revert "net: sctp: Fix a_rwnd/rwnd management to
 reflect real state of the receiver's buffer"

On 04/16/2014 05:02 AM, Alexander Sverdlin wrote:
> Hi Dongsheng!
>
> On 16/04/14 10:39, ext Dongsheng Song wrote:
>> >From my testing, netperf throughput from 600 Mbit/s drop to 6 Mbit/s,
>> the penalty is 99 %.
>
> The question was, do you see this as a problem of the new rwnd algorithm?
> If yes, how exactly?

The algorithm isn't wrong, but the implementation appears to have
a bug with window update SACKs.  The problem is that
sk->sk_rmem_alloc is updated by the skb destructor when
skb is freed.  This happens after we call sctp_assoc_rwnd_update()
which tries to send the update SACK.  As a result, in default
config with per-socket accounting, the test
    if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0)
uses the wrong values for rx_count and results in advertisement
of decreased rwnd instead of what is really available.

Can you try this patch without the revert applied.

Thanks
-vlad

diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index 8d198ae..cc2d440 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -1011,7 +1011,6 @@ static void sctp_ulpevent_release_data(struct
sctp_ulpevent *event)
 {
 	struct sk_buff *skb, *frag;
 	unsigned int	len;
-	struct sctp_association *asoc;

 	/* Current stack structures assume that the rcv buffer is
 	 * per socket.   For UDP style sockets this is not true as
@@ -1036,11 +1035,7 @@ static void sctp_ulpevent_release_data(struct
sctp_ulpevent *event)
 	}

 done:
-	asoc = event->asoc;
-	sctp_association_hold(asoc);
 	sctp_ulpevent_release_owner(event);
-	sctp_assoc_rwnd_update(asoc, true);
-	sctp_association_put(asoc);
 }

 static void sctp_ulpevent_release_frag_data(struct sctp_ulpevent *event)
@@ -1071,12 +1066,21 @@ done:
  */
 void sctp_ulpevent_free(struct sctp_ulpevent *event)
 {
+	struct sctp_association *assoc = event->asoc;
+
 	if (sctp_ulpevent_is_notification(event))
 		sctp_ulpevent_release_owner(event);
 	else
 		sctp_ulpevent_release_data(event);

 	kfree_skb(sctp_event2skb(event));
+	/* The socket is locked and the assocaiton can't go anywhere
+	 * since we are walking the uplqueue.  No need to hold
+	 * another ref on the association.  Now that the skb has been
+	 * freed and accounted for everywhere, see if we need to send
+	 * a window update SACK.
+	 */
+	sctp_assoc_rwnd_update(asoc, true);
 }

 /* Purge the skb lists holding ulpevents. */


> The algorithm actually has no preference to any amount of data.
> It was fine-tuned before to serve as congestion control algorithm, but
this should
> be located elsewhere. Perhaps, indeed, a re-use of congestion control
modules from
> TCP would be possible...
>
>> http://www.spinics.net/lists/linux-sctp/msg03308.html
>>
>>
>> On Wed, Apr 16, 2014 at 2:57 PM, Matija Glavinic Pecotic
>> <matija.glavinic-pecotic.ext@....com> wrote:
>>>
>>> Hello Vlad,
>>>
>>> On 04/14/2014 09:57 PM, ext Vlad Yasevich wrote:
>>>> The base approach is sound.  The idea is to calculate rwnd based
>>>> on receiver buffer available.  The algorithm chosen however, is
>>>> gives a much higher preference to small data and penalizes large
>>>> data transfers.  We need to figure our something else here..
>>>
>>> I don't follow you here. Could you please explain what do you see as
penalty?
>>>
>>> Thanks,
>>>
>>> Matija
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>>> the body of a message to majordomo@...r.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html