netdev - Re: [PATCH] tcp: Fix highest_sack and highest_sack

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <55BD3A1D-746E-4B04-96D6-BDF2D3D92736@linux.alibaba.com>
Date:   Sat, 28 Dec 2019 12:26:07 +0800
From:   Cambda Zhu <cambda@...ux.alibaba.com>
To:     Eric Dumazet <eric.dumazet@...il.com>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        David Miller <davem@...emloft.net>,
        Yuchung Cheng <ycheng@...gle.com>,
        netdev <netdev@...r.kernel.org>,
        Dust Li <dust.li@...ux.alibaba.com>
Subject: Re: [PATCH] tcp: Fix highest_sack and highest_sack_seq



> On Dec 28, 2019, at 11:28, Eric Dumazet <eric.dumazet@...il.com> wrote:
> 
> 
> 
> On 12/27/19 4:49 PM, Eric Dumazet wrote:
>> 
>> 
>> On 12/27/19 12:52 AM, Cambda Zhu wrote:
>>> From commit 50895b9de1d3 ("tcp: highest_sack fix"), the logic about
>>> setting tp->highest_sack to the head of the send queue was removed.
>>> Of course the logic is error prone, but it is logical. Before we
>>> remove the pointer to the highest sack skb and use the seq instead,
>>> we need to set tp->highest_sack to NULL when there is no skb after
>>> the last sack, and then replace NULL with the real skb when new skb
>>> inserted into the rtx queue, because the NULL means the highest sack
>>> seq is tp->snd_nxt. If tp->highest_sack is NULL and new data sent,
>>> the next ACK with sack option will increase tp->reordering unexpectedly.
>>> 
>>> This patch sets tp->highest_sack to the tail of the rtx queue if
>>> it's NULL and new data is sent. The patch keeps the rule that the
>>> highest_sack can only be maintained by sack processing, except for
>>> this only case.
>>> 
>>> Fixes: 50895b9de1d3 ("tcp: highest_sack fix")
>>> Signed-off-by: Cambda Zhu <cambda@...ux.alibaba.com>
>>> ---
>>> net/ipv4/tcp_output.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>> 
>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>>> index 1f7735ca8f22..58c92a7d671c 100644
>>> --- a/net/ipv4/tcp_output.c
>>> +++ b/net/ipv4/tcp_output.c
>>> @@ -72,6 +72,9 @@ static void tcp_event_new_data_sent(struct sock *sk, struct sk_buff *skb)
>>> 	__skb_unlink(skb, &sk->sk_write_queue);
>>> 	tcp_rbtree_insert(&sk->tcp_rtx_queue, skb);
>>> 
>>> +	if (tp->highest_sack == NULL)
>>> +		tp->highest_sack = skb;
>>> +
>>> 	tp->packets_out += tcp_skb_pcount(skb);
>>> 	if (!prior_packets || icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
>>> 		tcp_rearm_rto(sk);
>>> 
>> 
>> 
>> This patch seems to keep something in the fast path, even for flows never experiencing
>> sacks.
>> 
>> Why would we always painfully maintain tp->highest_sack to the left most skb in the rtx queue ?
>> 
>> Given that tcp_highest_sack_seq() has an explicit check about tp->highest_sack being NULL,
>> there is something I do not quite understand yet.
>> 
>> Why keeping this piece of code ?
>> 
>>    if (tp->highest_sack == NULL)
>>            return tp->snd_nxt;
>> 
>> Defensive programming should be replaced by better knowledge.
>> 
>> Can you provide more explanations, or maybe a packetdrill test ?
>> 
>> Maybe some other path (in slow path this time) misses a !tp->highest_sack test.
>> 
>> Thanks.
>> 
> 
> Or maybe the real bug has been latent for years.
> 
> (added in commit 6859d49475d4 "[TCP]: Abstract tp->highest_sack accessing & point to next skb" )
> 
> 
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index e460ea7f767ba627972a63a974cae80357808366..32781fb5cf3a7aa1158c98cb87754b59dc922b1f 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1843,12 +1843,9 @@ static inline void tcp_push_pending_frames(struct sock *sk)
>  */
> static inline u32 tcp_highest_sack_seq(struct tcp_sock *tp)
> {
> -       if (!tp->sacked_out)
> +       if (!tp->sacked_out || !tp->highest_sack)
>                return tp->snd_una;
> 
> -       if (tp->highest_sack == NULL)
> -               return tp->snd_nxt;
> -
>        return TCP_SKB_CB(tp->highest_sack)->seq;
> }

The key is that we use skb but not seq to save the highest sack yet. I think it should be done in
a better way. If we replace the next skb with the last sacked skb or seq after it, we need to change
all of the sack processing about highest sack, which is worth to do in my opinion.

Thanks.