netdev - Re: Bug in tcp_rtx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7492f8e7-d4cc-8fe2-a02c-6749a8068be5@gmail.com>
Date:   Mon, 30 May 2022 11:09:30 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Laurent Fasnacht <laurent.fasnacht@...ton.ch>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Yuchung Cheng <ycheng@...gle.com>,
        Neal Cardwell <ncardwell@...gle.com>
Subject: Re: Bug in tcp_rtx_synack?


On 5/30/22 03:08, Laurent Fasnacht wrote:
> Hello,
>
> I'm having the following bug on a 5.16 kernel, it happens periodically (a few times per day, on every of our production server that has this kernel). I'm unable to reproduce on machines with lower load and I also know for sure that it doesn't happen on the 5.10 kernel.
>
> I wonder if it's related to trace_tcp_retransmit_synack?
>
> I'm happy to help, let me know.
>
> Cheers,
> Laurent
>
> ---
>
> (gdb) l *(tcp_rtx_synack+0x8d)
> 0xffffffff817ee76d is in tcp_rtx_synack (arch/x86/include/asm/preempt.h:95).
> 90       * a decrement which hits zero means we have no preempt_count and should
> 91       * reschedule.
> 92       */
> 93      static __always_inline bool __preempt_count_dec_and_test(void)
> 94      {
> 95              return GEN_UNARY_RMWcc("decl", __preempt_count, e, __percpu_arg([var]));
> 96      }
> 97
> 98      /*
> 99       * Returns true when we need to resched and can (barring IRQ state).
> (gdb) l *(tcp_rtx_synack+0x8d-4)
> 0xffffffff817ee769 is in tcp_rtx_synack (include/trace/events/tcp.h:190).
> 185             TP_PROTO(struct sock *sk),
> 186
> 187             TP_ARGS(sk)
> 188     );
> 189
> 190     TRACE_EVENT(tcp_retransmit_synack,
> 191
> 192             TP_PROTO(const struct sock *sk, const struct request_sock *req),
> 193
> 194             TP_ARGS(sk, req),
>
> --
>
> BUG: using __this_cpu_add() in preemptible [00000000] code: epollpep/2180
> caller is tcp_rtx_synack.part.0+0x36/0xc0
> CPU: 10 PID: 2180 Comm: epollpep Tainted: G           OE     5.16.0-0.bpo.4-amd64 #1  Debian 5.16.12-1~bpo11+1
> Hardware name: Supermicro SYS-5039MC-H8TRF/X11SCD-F, BIOS 1.7 11/23/2021
> Call Trace:
>   <TASK>
>   dump_stack_lvl+0x48/0x5e
>   check_preemption_disabled+0xde/0xe0
>   tcp_rtx_synack.part.0+0x36/0xc0
>   tcp_rtx_synack+0x8d/0xa0
>   ? kmem_cache_alloc+0x2e0/0x3e0
>   ? apparmor_file_alloc_security+0x3b/0x1f0
>   inet_rtx_syn_ack+0x16/0x30
>   tcp_check_req+0x367/0x610
>   tcp_rcv_state_process+0x91/0xf60
>   ? get_nohz_timer_target+0x18/0x1a0
>   ? lock_timer_base+0x61/0x80
>   ? preempt_count_add+0x68/0xa0
>   tcp_v4_do_rcv+0xbd/0x270
>   __release_sock+0x6d/0xb0
>   release_sock+0x2b/0x90
>   sock_setsockopt+0x138/0x1140
>   ? __sys_getsockname+0x7e/0xc0
>   ? aa_sk_perm+0x3e/0x1a0
>   __sys_setsockopt+0x198/0x1e0
>   __x64_sys_setsockopt+0x21/0x30
>   do_syscall_64+0x38/0xc0
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7fefe7d4441a
> Code: ff ff ff c3 0f 1f 40 00 48 8b 15 71 ea 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 46 ea 0b 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffca1cd0ab8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
> RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fefe7d4441a
> RDX: 0000000000000009 RSI: 0000000000000001 RDI: 00000000000006f3
> RBP: 00007ffca1cd1410 R08: 0000000000000004 R09: 0000560e9f8a55ec
> R10: 00007ffca1cd10f0 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007ffca1cd1190 R14: 00007ffca1cd1198 R15: 00007ffca1cd23f0
>   </TASK>
>

I think this is related to FastOpen server code.

It seems that we retransmit a SYNACK if we receive the SYN another time.

This means tcp_rtx_synack() can be called from process context (instead 
of BH handler) in this particular case.

Fix could be:

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 
b4b2284ed4a2c9e2569bd945e3b4e023c5502f25..1c054431e358328fe3849f5a45aaa88308a1e1c8 
100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -4115,8 +4115,8 @@ int tcp_rtx_synack(const struct sock *sk, struct 
request_sock *req)
         res = af_ops->send_synack(sk, NULL, &fl, req, NULL, 
TCP_SYNACK_NORMAL,
                                   NULL);
         if (!res) {
-               __TCP_INC_STATS(sock_net(sk), TCP_MIB_RETRANSSEGS);
-               __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSYNRETRANS);
+               TCP_INC_STATS(sock_net(sk), TCP_MIB_RETRANSSEGS);
+               NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSYNRETRANS);
                 if (unlikely(tcp_passive_fastopen(sk)))
                         tcp_sk(sk)->total_retrans++;
                 trace_tcp_retransmit_synack(sk, req);