lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 12 Dec 2011 14:40:36 -0500
From:	Neal Cardwell <ncardwell@...gle.com>
To:	Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
Cc:	David Miller <davem@...emloft.net>,
	Netdev <netdev@...r.kernel.org>, nanditad@...gle.com,
	Yuchung Cheng <ycheng@...gle.com>, therbert@...gle.com
Subject: Re: [PATCH 1/1] tcp: fix spurious undos in CA_Open

On Mon, Dec 12, 2011 at 7:05 AM, Ilpo Järvinen
<ilpo.jarvinen@...sinki.fi> wrote:
> On Mon, 28 Nov 2011, Neal Cardwell wrote:
>
>> On Sun, Nov 27, 2011 at 6:58 PM, David Miller <davem@...emloft.net> wrote:
>>
>> >> Also, one (serious) word of caution! This change, by its extending of
>> >> CA_Open state, is somewhat similar to what I made FRTO years ago, and
>> >> managed to introduces subtle breaking to the state machine. Please check
>> >> that the problem similar to what was fixed by
>> >> a1c1f281b84a751fdb5ff919da3b09df7297619f does not reappear (possibly in
>> >> some other form causing spurious undos). ...To me it seems that
>> >> tcp_packet_delayed might be similarly confused after the given patch.
>> >
>> > Neil, please look into this so we can have any such issues fixed
>> > in time for the next merge window.
>>
>> Absolutely. I will look into the areas that Ilpo mentioned.
>
> Unfortunately nothing has happened? So I finally had to come up something
> myself. ...Compile tested only.

Sorry for the radio silence, but I have been looking at this. So far
it's been a time-consuming process, as we've uncovered a number of
pre-existing issues in tcp_packet_delayed that seem like quite serious
bugs:

1) tcp_packet_delayed implicitly assumes, but does not verify, that
the ACK in question (whose ECR field precedes retrans_stamp) covers
all retransmitted segments. So if we retransmit segments 1 and 2, and
then get an ACK for 1 that has an ECR indicating it was for the
original copy of segment 1, then we undo,  even in cases where segment
2 really was lost and needed to be retransmitted.

2) tcp_packet_delayed implicitly assumes that if we retransmit a
packet that was truly lost then any ACK for the retransmitted instance
of that packet will echo the timestamp field in the retransmitted
packet. But if the receiver is implementing delayed ACKs and correctly
implements timestamps per RFC 1323, then the receiver will be obeying
this clause: "Thus, when delayed ACKs are in use, the receiver should
reply with the TSval field from the earliest unacknowledged segment."
Thus tcp_packet_delayed can cause spurious undos in the following
circumstances:

- send segment 1 with tsval 10
- receiver is doing delayed ACKs and does not ACK 1 but sets TS.Recent to 10
- send segment 2 with tsval 20
- segment 2 is lost in transit
- RTO, retransmit segment 2 with tsval 30
- receiver sends delayed ACK for packets 1 and 2 with TS.Recent of 10
- sender receives ACK for packet 2 with tsecho of 10, lower than 30,
and thinks RTO was spurious

For better or worse, both of these bugs appear to be inherited from
the Eifel Detection Algorithm RFC (RFC 3522).

In any case, we've been working on coming up with a solution for these issues.

> --
> [PATCH 1/1] tcp: fix spurious undos in CA_Open
>
> There's a landmine in tcp_packet_delayed. In CA_Open the
> retrans_stamp is very eagerly cleared which falsely triggers
> packet-delayed detection. Therefore this code cannot be used
> in CA_Open if the retrans_stamp was already cleared.
>
> This became issue once DSACK detection was extended to happen
> in CA_Open state too (f698204bd0b, tcp: allow undo from
> reordered DSACKs). Essentially the whole congestion control
> is disabled because the undos now always restore the previous
> window. I wonder if this was already in production... ...at
> least the Internet didn't melt ;-).

I'm pretty sure this commit is unnecessary. It seems like a NOP. Note
that tcp_may_undo and tcp_packet_delayed are never called in state
TCP_CA_Open. tcp_may_undo is only called from tcp_try_undo_recovery,
tcp_try_undo_partial, and tcp_try_undo_loss, and to call any of these
functions you must be in either TCP_CA_Loss or TCP_CA_Recovery. Only
tcp_try_undo_dsack is called from TCP_CA_Open, but it makes no
reference to retrans_stamp.

Please let me know if I'm missing something, or if there's a specific
scenario you're concerned about.

thanks,
neal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ