netdev - Re: Crash when receiving FIN-ACK in TCP_FIN

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cae50d97-5d19-7b35-0e82-630f905c1bf6@gmail.com>
Date:   Wed, 30 Oct 2019 18:27:27 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Josh Hunt <johunt@...mai.com>,
        Subash Abhinov Kasiviswanathan <subashab@...eaurora.org>,
        Neal Cardwell <ncardwell@...gle.com>
Cc:     Netdev <netdev@...r.kernel.org>, Yuchung Cheng <ycheng@...gle.com>,
        Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: Crash when receiving FIN-ACK in TCP_FIN_WAIT1 state



On 10/30/19 2:48 PM, Josh Hunt wrote:
> On 10/30/19 11:27 AM, Subash Abhinov Kasiviswanathan wrote:
>>> Thanks. Do you mind sharing what your patch looked like, so we can
>>> understand precisely what was changed?
>>>
>>> Also, are you able to share what the workload looked like that tickled
>>> this issue? (web client? file server?...)
>>
>> Sure. This was seen only on our regression racks and the workload there
>> is a combination of FTP, browsing and other apps.
>>
>> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
>> index 4374196..9af7497 100644
>> --- a/include/linux/tcp.h
>> +++ b/include/linux/tcp.h
>> @@ -232,7 +232,8 @@ struct tcp_sock {
>>                  fastopen_connect:1, /* FASTOPEN_CONNECT sockopt */
>>                  fastopen_no_cookie:1, /* Allow send/recv SYN+data without a cookie */
>>                  is_sack_reneg:1,    /* in recovery from loss with SACK reneg? */
>> -               unused:2;
>> +               unused:1,
>> +               wqp_called:1;
>>          u8      nonagle     : 4,/* Disable Nagle algorithm? */
>>                  thin_lto    : 1,/* Use linear timeouts for thin streams */
>>                  recvmsg_inq : 1,/* Indicate # of bytes in queue upon recvmsg */
>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>> index 1a1fcb3..0c29bdd 100644
>> --- a/net/ipv4/tcp.c
>> +++ b/net/ipv4/tcp.c
>> @@ -2534,6 +2534,9 @@ void tcp_write_queue_purge(struct sock *sk)
>>          INIT_LIST_HEAD(&tcp_sk(sk)->tsorted_sent_queue);
>>          sk_mem_reclaim(sk);
>>          tcp_clear_all_retrans_hints(tcp_sk(sk));
>> +       tcp_sk(sk)->highest_sack = NULL;
>> +       tcp_sk(sk)->sacked_out = 0;
>> +       tcp_sk(sk)->wqp_called = 1;
>>          tcp_sk(sk)->packets_out = 0;
>>          inet_csk(sk)->icsk_backoff = 0;
>>   }
>>
>>
> 
> Neal
> 
> Since tcp_write_queue_purge() calls tcp_rtx_queue_purge() and we're deleting everything in the retrans queue there, doesn't it make sense to zero out all of those associated counters? Obviously clearing sacked_out is helping here, but is there a reason to keep track of lost_out, retrans_out, etc if retrans queue is now empty? Maybe calling tcp_clear_retrans() from tcp_rtx_queue_purge() ?

First, I would like to understand if we hit this problem on current upstream kernels.

Maybe a backport forgot a dependency.

tcp_write_queue_purge() calls tcp_clear_all_retrans_hints(), not tcp_clear_retrans(),
this is probably for a reason.

Brute force clearing these fields might hide a serious bug.