[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cae50d97-5d19-7b35-0e82-630f905c1bf6@gmail.com>
Date: Wed, 30 Oct 2019 18:27:27 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Josh Hunt <johunt@...mai.com>,
Subash Abhinov Kasiviswanathan <subashab@...eaurora.org>,
Neal Cardwell <ncardwell@...gle.com>
Cc: Netdev <netdev@...r.kernel.org>, Yuchung Cheng <ycheng@...gle.com>,
Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: Crash when receiving FIN-ACK in TCP_FIN_WAIT1 state
On 10/30/19 2:48 PM, Josh Hunt wrote:
> On 10/30/19 11:27 AM, Subash Abhinov Kasiviswanathan wrote:
>>> Thanks. Do you mind sharing what your patch looked like, so we can
>>> understand precisely what was changed?
>>>
>>> Also, are you able to share what the workload looked like that tickled
>>> this issue? (web client? file server?...)
>>
>> Sure. This was seen only on our regression racks and the workload there
>> is a combination of FTP, browsing and other apps.
>>
>> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
>> index 4374196..9af7497 100644
>> --- a/include/linux/tcp.h
>> +++ b/include/linux/tcp.h
>> @@ -232,7 +232,8 @@ struct tcp_sock {
>> fastopen_connect:1, /* FASTOPEN_CONNECT sockopt */
>> fastopen_no_cookie:1, /* Allow send/recv SYN+data without a cookie */
>> is_sack_reneg:1, /* in recovery from loss with SACK reneg? */
>> - unused:2;
>> + unused:1,
>> + wqp_called:1;
>> u8 nonagle : 4,/* Disable Nagle algorithm? */
>> thin_lto : 1,/* Use linear timeouts for thin streams */
>> recvmsg_inq : 1,/* Indicate # of bytes in queue upon recvmsg */
>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>> index 1a1fcb3..0c29bdd 100644
>> --- a/net/ipv4/tcp.c
>> +++ b/net/ipv4/tcp.c
>> @@ -2534,6 +2534,9 @@ void tcp_write_queue_purge(struct sock *sk)
>> INIT_LIST_HEAD(&tcp_sk(sk)->tsorted_sent_queue);
>> sk_mem_reclaim(sk);
>> tcp_clear_all_retrans_hints(tcp_sk(sk));
>> + tcp_sk(sk)->highest_sack = NULL;
>> + tcp_sk(sk)->sacked_out = 0;
>> + tcp_sk(sk)->wqp_called = 1;
>> tcp_sk(sk)->packets_out = 0;
>> inet_csk(sk)->icsk_backoff = 0;
>> }
>>
>>
>
> Neal
>
> Since tcp_write_queue_purge() calls tcp_rtx_queue_purge() and we're deleting everything in the retrans queue there, doesn't it make sense to zero out all of those associated counters? Obviously clearing sacked_out is helping here, but is there a reason to keep track of lost_out, retrans_out, etc if retrans queue is now empty? Maybe calling tcp_clear_retrans() from tcp_rtx_queue_purge() ?
First, I would like to understand if we hit this problem on current upstream kernels.
Maybe a backport forgot a dependency.
tcp_write_queue_purge() calls tcp_clear_all_retrans_hints(), not tcp_clear_retrans(),
this is probably for a reason.
Brute force clearing these fields might hide a serious bug.
Powered by blists - more mailing lists