[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180207021647epcms1p128701b4f7e63a88838dd132221119571@epcms1p1>
Date: Wed, 07 Feb 2018 11:16:47 +0900
From: 배석진 <soukjin.bae@...sung.com>
To: "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: [Android][Kernel][TCP/IP] report of packet discarding during tcp
handshaking
Hello,
this is bae working on samsung elec.
we have a problem that packet discarded during 3-way handshaking on TCP.
already looks like that Mr Dumazet try to fix the similar issue on this patch, https://android.googlesource.com/kernel/common/+/5e0724d027f0548511a2165a209572d48fe7a4c8
but we are still facing the another corner case.
it needs preconditions for this problem.
(1) last ack packet of 3-way handshaking and next packet have been arrived at almost same time
(2) next packet, the first data packet was fragmented
(3) enable rps
[tcp dump]
No. A-Time Source Destination Len Seq Info
1 08:35:18.115259 193.81.6.70 10.217.0.47 84 0 [SYN] Seq=0 Win=21504 Len=0 MSS=1460
2 08:35:18.115888 10.217.0.47 193.81.6.70 84 0 6100 → 5063 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1460
3 08:35:18.142385 193.81.6.70 10.217.0.47 80 1 5063 → 6100 [ACK] Seq=1 Ack=1 Win=21504 Len=0
4 08:35:18.142425 193.81.6.70 10.217.0.47 1516 Fragmented IP protocol (proto=Encap Security Payload 50, off=0, ID=6e24) [Reassembled in #5]
5 08:35:18.142449 193.81.6.70 10.217.0.47 60 1 5063 → 6100 [ACK] Seq=1 Ack=1 Win=21504 Len=1460 [TCP segment of a reassembled PDU]
6 08:35:21.227070 193.81.6.70 10.217.0.47 1516 Fragmented IP protocol (proto=Encap Security Payload 50, off=0, ID=71e9) [Reassembled in #7]
7 08:35:21.227191 193.81.6.70 10.217.0.47 60 1 [TCP Retransmission] 5063 → 6100 [ACK] Seq=1 Ack=1 Win=21504 Len=1460
8 08:35:21.228822 10.217.0.47 193.81.6.70 80 1 6100 → 5063 [ACK] Seq=1 Ack=1461 Win=32120 Len=0
- last ack packet of handshaking(No.3) and next data packet(No4,5) were arrived with just 40us time gap.
[kernel log]
- stage 1
<3>[ 1037.669229] I[0: system_server: 3778] get_rps_cpu: skb(64), check hash value:3412396090
<3>[ 1037.669261] I[0: system_server: 3778] get_rps_cpu: skb(1500), check hash value:158575680
<3>[ 1037.669285] I[0: system_server: 3778] get_rps_cpu: skb(44), check hash value:158575680
- stage 2
<3>[ 1037.669541] I[1: Binder:3778_13: 8391] tcp_v4_rcv: Enter! skb(seq:A93E087B, len:1480)
<3>[ 1037.669552] I[2:Jit thread pool:12990] tcp_v4_rcv: Enter! skb(seq:A93E087B, len:20)
<3>[ 1037.669564] I[2:Jit thread pool:12990] tcp_v4_rcv: check sk_state:12 skb(seq:A93E087B, len:20)
<3>[ 1037.669585] I[2:Jit thread pool:12990] tcp_check_req, Enter!: skb(seq:A93E087B, len:20)
<3>[ 1037.669612] I[1: Binder:3778_13: 8391] tcp_v4_rcv: check sk_state:12 skb(seq:A93E087B, len:1480)
<3>[ 1037.669625] I[1: Binder:3778_13: 8391] tcp_check_req, Enter!: skb(seq:A93E087B, len:1480)
<3>[ 1037.669653] I[2:Jit thread pool:12990] tcp_check_req, skb(seq:A93E087B, len:20), own_req:1
<3>[ 1037.669668] I[1: Binder:3778_13: 8391] tcp_check_req, skb(seq:A93E087B, len:1480), own_req:0
<3>[ 1037.669708] I[2:Jit thread pool:12990] tcp_rcv_state_process, Established: skb(seq:A93E087B, len:20)
<3>[ 1037.669724] I[1: Binder:3778_13: 8391] tcp_v4_rcv: discard_relse skb(seq:A93E087B, len:1480)
- stage 1
because of the data packet has been fragmented(No.4 & 5),
it was hashed to another core(cpu1) which was differnet with last ack packet(cpu2), by rps.
so last ack and data packet handled in different core almost simultaniously, at NEW_SYN_RECV state.
- stage 2, cpu2
one of them will be treated in tcp_check_req() function a little more earlier,
then it got the true value for own_req from tcp_v4_syn_recv_sock(), and return valid nsk.
finally going to ESTABLISHED state.
- stage 2, cpu1
but another, later one is got the false value for own_req,
and return null for nsk, because of own_req value is false in inet_csk_complete_hashdance().
so earlier packet was handled successfully but later one has gone to discard.
at this time, one of the ack or data packet could be discarded, by schedule timing. (we saw both of them)
if the ack was discarded, that's ok.
tcp state goes to ESTABLISHED by piggyback on data packet, and payload will be deliverd to upper layer.
but if the data packet was discarded, client can't receive the payload it have to.
this is the problem we faced.
although server retransmitted the dropped packet(No6,7), but it takes few seconds delay.
since of this problem occured in IMS-Call setup, this is appeared to call connection delay.
these situation is serious problem in call service.
do you have any report about this or plan to fix it?
best regards,
bae.
--------------------------------------------------------
배 석 진 (Bae Souk-Jin)
System R&D Group 2
Mobile Device Division Telecommunication Business
SAMSUNG ELECTRONICS CO. LTD
Mobile : 82-10-2888-2200
E-mail : soukjin.bae@...sung.com
--------------------------------------------------------
Powered by blists - more mailing lists