netdev - Re: A buggy behavior for Linux TCP Reno and HTCP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <64ab0f8a-e093-1adc-82f3-ffd8d6a30c7a@unl.edu>
Date:   Fri, 21 Jul 2017 15:26:56 -0500
From:   Lisong Xu <xu@....edu>
To:     Yuchung Cheng <ycheng@...gle.com>, Wei Sun <unlcsewsun@...il.com>
Cc:     netdev <netdev@...r.kernel.org>
Subject: Re: A buggy behavior for Linux TCP Reno and HTCP

Hi Yuchung,

This test scenario is only one example to trigger this bug. In 
generally, as long as cwnd <4, the undo function has this bug.

This would not be a problem for a normal network. But might be an issue, 
if the network is highly congested (e.g., many many TCP flows with small 
cwnd <4). In this case, the bug may possibly mistakenly double the 
sending rate of each flow, and make a highly congested network even more 
congested .... similar to congestion collapse. This is actually why we 
need the congestion control algorithms in the first place.

Thanks
Lisong

On 7/21/2017 12:59 PM, Yuchung Cheng wrote:
> On Thu, Jul 20, 2017 at 2:28 PM, Wei Sun <unlcsewsun@...il.com> wrote:
>> Hi Yuchung,
>>
>> Sorry for the confusion.  The test case was adapted from an old DSACK
>> test case (i.e., forget to remove something).
>>
>> Attached is a new and simple one. Thanks
> Note that the test scenario is fairly rare IMO: the connection first
> experience timeouts, then its retransmission got acked, then the
> original packets get Acked (ack w/ val 1400 ecr 130). It can be really
> long reordering, or reordering plus packet loss.
>
> The Linux undo state machines may not handle this perfectly, but it's
> probably not worth extra state for such rare events.
>
>>
>>
>> On Wed, Jul 19, 2017 at 2:31 PM, Yuchung Cheng <ycheng@...gle.com> wrote:
>>> On Tue, Jul 18, 2017 at 2:36 PM, Wei Sun <unlcsewsun@...il.com> wrote:
>>>> Hi there,
>>>>
>>>> We find a buggy behavior when using Linux TCP Reno and HTCP in low
>>>> bandwidth or highly congested network environments.
>>>>
>>>> In a simple word, their undo functions may mistakenly double the cwnd,
>>>> leading to a more aggressive behavior in a highly congested scenario.
>>>>
>>>>
>>>> The detailed reason:
>>>>
>>>> The current reno undo function assumes cwnd halving (and thus doubles
>>>> the cwnd), but it doesn't consider a corner case condition that
>>>> ssthresh is at least 2.
>>>>
>>>> e.g.,
>>>>                           cwnd              ssth
>>>> An initial state:     2                    5
>>>> A spurious loss:   1                    2
>>>> Undo:                   4                    5
>>>>
>>>> Here the cwnd after undo is two times as that before undo. Attached is
>>>> a simple script to reproduce it.
>>> the packetdrill script is a bit confusing: it disables SACK but then
>>> the client returns ACK w/ SACKs, also 3 dupacks happen after RTO so
>>> the sender isn't technically going through a fast recovery...
>>>
>>> could you provide a better test?
>>>
>>>> A similar reason for HTCP, so we recommend to store the cwnd on loss
>>>> in .ssthresh implementation and restore it again in .undo_cwnd for TCP
>>>> Reno and HTCP implementations.
>>>>
>>>> Thanks