[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJwzM1k7iW9tJZiO-JhVbnT-EmwaJbsroaVbJLnSVY-tyCzjLQ@mail.gmail.com>
Date: Sat, 9 Nov 2019 21:59:10 -0800
From: Avinash Patil <avinashapatil@...il.com>
To: netdev@...r.kernel.org
Subject: Possible bug in TCP retry logic/Kernel crash
Hi everyone,
Kernel: Linux 4.19.35 kernel built from linux-stable
I am seeing this issue on our platform and suspect this is TCP issue:
[ 3148.796319] Oops
[ 3148.799789] Path: /usr/bin/qtn_dut
[ 3148.803306] CPU: 0 PID: 1341 Comm: qtn_dut Tainted: P O
4.19.35 #4
[ 3148.810876]
[ 3148.810876] [ECR ]: 0x00220100 => Invalid Read @ 0x00000008 by
insn @ 0x8b1bc7e8
[ 3148.820064] [EFA ]: 0x00000008
[ 3148.820064] [BLINK ]: tcp_try_coalesce+0x3c/0xf0
[ 3148.820064] [ERET ]: skb_try_coalesce+0x94/0x3a0
[ 3148.832704] [STAT32]: 0x00000206 : K E2 E1
[ 3148.837677] BTA: 0x8b309ca3 SP: 0x8c92db44 FP: 0x00000000
[ 3148.843338] LPS: 0x8b304b94 LPE: 0x8b304b9c LPC: 0x00000000
[ 3148.849023] r00: 0x8c8743c0 r01: 0x8c92a0e0 r02: 0x8c92dbaa
[ 3148.849023] r03: 0x00000000 r04: 0x40000214 r05: 0x8b221ab8
[ 3148.849023] r06: 0x8bab8f3d r07: 0x00000000 r08: 0x00000000
[ 3148.849023] r09: 0x00000000 r10: 0x1f4f9e47 r11: 0x00000000
[ 3148.849023] r12: 0x00000000 r13: 0x8afcecfc r14: 0x000d2bb8
[ 3148.849023] r15: 0x5682fbc0 r16: 0xffffffff r17: 0x00000000
[ 3148.849023] r18: 0x00000001 r19: 0x5682faa4 r20: 0x5682fa84
[ 3148.849023] r21: 0x5682fa64 r22: 0x5682fb30 r23: 0x00000020
[ 3148.849023] r24: 0x000d2bb8 r25: 0x5682fbc0
[ 3148.849023]
[ 3148.849023]
[ 3148.901689]
[ 3148.901689] Stack Trace:
[ 3148.905781] Firmware build version: AAA
[ 3148.905781] Firmware configuration: BBB
[ 3148.905781] Hardware ID : CCC
[ 3148.920879] skb_try_coalesce+0x94/0x3a0
[ 3148.925026] tcp_try_coalesce+0x3c/0xf0
[ 3148.929079] tcp_queue_rcv+0x44/0x164
[ 3148.932953] tcp_data_queue+0x32a/0x75c
[ 3148.936946] tcp_rcv_established+0x37e/0x7d4
[ 3148.941438] tcp_v4_do_rcv+0xda/0x120
[ 3148.945320] tcp_v4_rcv+0x8f2/0xa04
[ 3148.949034] ip_local_deliver+0x72/0x208
[ 3148.953179] process_backlog+0xbe/0x1b0
[ 3148.957169] net_rx_action+0xfe/0x27c
[ 3148.961057] __do_softirq+0xf0/0x228
[ 3148.964863] __local_bh_enable_ip+0xae/0xb4
[ 3148.969277] ip_finish_output2.constprop.6+0x116/0x368
[ 3148.974641] __tcp_transmit_skb+0x56e/0xb3c
[ 3148.979039] tcp_write_xmit+0x34a/0x126c
[ 3148.983174] __tcp_push_pending_frames+0x28/0x94
[ 3148.987992] tcp_sendmsg_locked+0xa7a/0xc14
[ 3148.992386] tcp_sendmsg+0x1e/0x34
[ 3148.995935] __sys_sendto+0xc8/0xf4
[ 3148.999642] EV_Trap+0x11c/0x120
[ 3149.003057]
Conditions under which this happens:
There are 2 processes running on platform which communicate with TCP
sockets- P1 and P2.
1. P1 has 2 TCP sockets- one TCP client to communicate with P2 while
another TCP server to listen to client running on another machine.
2. P1 has issued command to P2 and P2 is preparing response.
3. While P2 is preparing response, P1 receives zero sized packet from
remote server and closes its server socket treating this as error.
Note: client socket is open/active. P2 prepares its response but its
buffered
4. P1 respawns server socket and issues another command to P2 and
waits for response.
5. P2 now sends 2 sets of data- one for old session and one response
for current command. I see kernel panic with backtrace as above.
There is another symptom of this issue :
# [ 194.416963] Alignment trap: fault in fix-up 0000a260 at [<00000001>]
[ 194.423419]
[ 194.423419] Misaligned Access
[ 194.427950] Path: (null)
[ 194.430517] CPU: 0 PID: 0 Comm: swapper Tainted: P O
4.19.35 #3
[ 194.437816]
[ 194.437816] [ECR ]: 0x00230400 => Misaligned r/w from 0x00000001
[ 194.445597] [EFA ]: 0x00000001
[ 194.445597] [BLINK ]: tcp_ack+0x5e6/0x1598
[ 194.445597] [ERET ]: tcp_ack+0x606/0x1598
[ 194.457087] [STAT32]: 0x0000020e : K A1 E2 E1
[ 194.462137] BTA: 0x8b3bf3b7 SP: 0x8b4c9c04 FP: 0x00000000
[ 194.467777] LPS: 0x8b3ba37c LPE: 0x8b3ba384 LPC: 0x00000000
[ 194.473454] r00: 0x8ce503c0 r01: 0x8f3b34e4 r02: 0x00000001
[ 194.473454] r03: 0xa0076e71 r04: 0x00000000 r05: 0x06e0ed35
[ 194.473454] r06: 0x8f3b3800 r07: 0x00000000 r08: 0x0b9362f8
[ 194.473454] r09: 0x00000000 r10: 0x00032e0c r11: 0x00000000
[ 194.473454] r12: 0xefec0000 r13: 0x8f3b3400 r14: 0x8f3b3c00
[ 194.473454] r15: 0x8ce503c0 r16: 0x00000001 r17: 0x8f3b3800
[ 194.473454] r18: 0x00000000 r19: 0x00000000 r20: 0x66c2443f
[ 194.473454] r21: 0x8b4c9c80 r22: 0x00000004 r23: 0x00000001
[ 194.473454] r24: 0x00000000 r25: 0x8b4cb2e0
[ 194.473454]
[ 194.473454]
[ 194.526128]
[ 194.526128] Stack Trace:
[ 194.530209]
[ 194.530209] Firmware build version: pyang_sh-swbuild04_main2ac-cl101263
[ 194.530216]
[ 194.530216] Firmware configuration: pearl_10gax_config
[ 194.538389]
[ 194.538389] Hardware ID : 65535
[ 194.550632] tcp_ack+0x606/0x1598
[ 194.554160] tcp_rcv_established+0x458/0x7d4
[ 194.558646] tcp_v4_do_rcv+0xda/0x120
[ 194.562521] tcp_v4_rcv+0x8f2/0xa04
[ 194.566162] ip_local_deliver+0x72/0x208
[ 194.570287] netif_receive_skb+0x62/0x104
[ 194.574510] br_handle_frame_finish.constprop.2+0x1a6/0x270
[ 194.580299] br_handle_frame+0x170/0x2a0
[ 194.584427] __netif_receive_skb_core+0x156/0x650
[ 194.589346] netif_receive_skb+0x50/0x104
[ 194.593582] wowlan_magic_packet_check+0xc68/0x16b8 [switch_tqe]
[ 194.599825] net_rx_action+0xfe/0x27c
[ 194.603695] __do_softirq+0xf0/0x228
[ 194.607477] __handle_domain_irq+0x5c/0x98
[ 194.611732] handle_interrupt_level1+0xcc/0xd8
Do you happen to know if this is already reported/fixed?
I can run more experiments/gather more debug data/stats if required.
Thanks in advance.
-Avinash
Powered by blists - more mailing lists