netdev - Re: Crash when receiving FIN-ACK in TCP_FIN

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <f2016893-bf9e-3b65-4fe8-ff1bba4f4ced@akamai.com>
Date:   Tue, 3 Dec 2019 09:24:34 -0800
From:   Josh Hunt <johunt@...mai.com>
To:     subashab@...eaurora.org, Eric Dumazet <eric.dumazet@...il.com>
Cc:     Neal Cardwell <ncardwell@...gle.com>,
        Netdev <netdev@...r.kernel.org>,
        Yuchung Cheng <ycheng@...gle.com>
Subject: Re: Crash when receiving FIN-ACK in TCP_FIN_WAIT1 state

On 11/29/19 6:51 PM, subashab@...eaurora.org wrote:
>>>> Since tcp_write_queue_purge() calls tcp_rtx_queue_purge() and we're 
>>>> deleting everything in the retrans queue there, doesn't it make 
>>>> sense to zero out all of those associated counters? Obviously 
>>>> clearing sacked_out is helping here, but is there a reason to keep 
>>>> track of lost_out, retrans_out, etc if retrans queue is now empty? 
>>>> Maybe calling tcp_clear_retrans() from tcp_rtx_queue_purge() ?
>>>
>>> First, I would like to understand if we hit this problem on current 
>>> upstream kernels.
>>>
>>> Maybe a backport forgot a dependency.
>>>
>>> tcp_write_queue_purge() calls tcp_clear_all_retrans_hints(), not 
>>> tcp_clear_retrans(),
>>> this is probably for a reason.
>>>
>>> Brute force clearing these fields might hide a serious bug.
>>>
>>
>> I guess we are all too busy to get more understanding on this :/
> 
> Our test devices are on 4.19.x and it is not possible to switch to a newer
> version. Perhaps Josh has seen this on a newer kernel.

Sorry I've been out of town without email access. To be clear I've never 
seen this crash. I've only noticed that we do not clear some counters 
when we clear out the retransmit queue and this caught my eye when 
debugging another unrelated issue. I will try and get some cycles this 
week to instrument a kernel and reproduce the behavior I was seeing. My 
concern IIRC was more around tcp_left_out() being > packets_out and 
retrans_out causing tcp_packets_in_flight() to wrap. Anyway I'll report 
my findings on this thread if they seem relevant otherwise maybe I'll 
start another discussion thread. I don't want to pollute this one with 
my ramblings...

Josh