netdev - Re: Crash when receiving FIN-ACK in TCP_FIN

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5a267a9d-2bf5-4978-b71d-0c8e71a64807@gmail.com>
Date:   Tue, 26 Nov 2019 21:30:47 -0800
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Eric Dumazet <eric.dumazet@...il.com>,
        Josh Hunt <johunt@...mai.com>,
        Subash Abhinov Kasiviswanathan <subashab@...eaurora.org>,
        Neal Cardwell <ncardwell@...gle.com>
Cc:     Netdev <netdev@...r.kernel.org>, Yuchung Cheng <ycheng@...gle.com>
Subject: Re: Crash when receiving FIN-ACK in TCP_FIN_WAIT1 state



On 10/30/19 6:27 PM, Eric Dumazet wrote:
> 
> 
> On 10/30/19 2:48 PM, Josh Hunt wrote:
>> On 10/30/19 11:27 AM, Subash Abhinov Kasiviswanathan wrote:
>>>> Thanks. Do you mind sharing what your patch looked like, so we can
>>>> understand precisely what was changed?
>>>>
>>>> Also, are you able to share what the workload looked like that tickled
>>>> this issue? (web client? file server?...)
>>>
>>> Sure. This was seen only on our regression racks and the workload there
>>> is a combination of FTP, browsing and other apps.
>>>
>>> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
>>> index 4374196..9af7497 100644
>>> --- a/include/linux/tcp.h
>>> +++ b/include/linux/tcp.h
>>> @@ -232,7 +232,8 @@ struct tcp_sock {
>>>                  fastopen_connect:1, /* FASTOPEN_CONNECT sockopt */
>>>                  fastopen_no_cookie:1, /* Allow send/recv SYN+data without a cookie */
>>>                  is_sack_reneg:1,    /* in recovery from loss with SACK reneg? */
>>> -               unused:2;
>>> +               unused:1,
>>> +               wqp_called:1;
>>>          u8      nonagle     : 4,/* Disable Nagle algorithm? */
>>>                  thin_lto    : 1,/* Use linear timeouts for thin streams */
>>>                  recvmsg_inq : 1,/* Indicate # of bytes in queue upon recvmsg */
>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>> index 1a1fcb3..0c29bdd 100644
>>> --- a/net/ipv4/tcp.c
>>> +++ b/net/ipv4/tcp.c
>>> @@ -2534,6 +2534,9 @@ void tcp_write_queue_purge(struct sock *sk)
>>>          INIT_LIST_HEAD(&tcp_sk(sk)->tsorted_sent_queue);
>>>          sk_mem_reclaim(sk);
>>>          tcp_clear_all_retrans_hints(tcp_sk(sk));
>>> +       tcp_sk(sk)->highest_sack = NULL;
>>> +       tcp_sk(sk)->sacked_out = 0;
>>> +       tcp_sk(sk)->wqp_called = 1;
>>>          tcp_sk(sk)->packets_out = 0;
>>>          inet_csk(sk)->icsk_backoff = 0;
>>>   }
>>>
>>>
>>
>> Neal
>>
>> Since tcp_write_queue_purge() calls tcp_rtx_queue_purge() and we're deleting everything in the retrans queue there, doesn't it make sense to zero out all of those associated counters? Obviously clearing sacked_out is helping here, but is there a reason to keep track of lost_out, retrans_out, etc if retrans queue is now empty? Maybe calling tcp_clear_retrans() from tcp_rtx_queue_purge() ?
> 
> First, I would like to understand if we hit this problem on current upstream kernels.
> 
> Maybe a backport forgot a dependency.
> 
> tcp_write_queue_purge() calls tcp_clear_all_retrans_hints(), not tcp_clear_retrans(),
> this is probably for a reason.
> 
> Brute force clearing these fields might hide a serious bug.
> 

I guess we are all too busy to get more understanding on this :/