lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180207021647epcms1p128701b4f7e63a88838dd132221119571@epcms1p1>
Date:   Wed, 07 Feb 2018 11:16:47 +0900
From:   배석진 <soukjin.bae@...sung.com>
To:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: [Android][Kernel][TCP/IP] report of packet discarding during tcp
 handshaking

Hello, 
this is bae working on samsung elec. 

we have a problem that packet discarded during 3-way handshaking on TCP. 
already looks like that Mr Dumazet try to fix the similar issue on this patch, https://android.googlesource.com/kernel/common/+/5e0724d027f0548511a2165a209572d48fe7a4c8 
but we are still facing the another corner case.

it needs preconditions for this problem.
(1) last ack packet of 3-way handshaking and next packet have been arrived at almost same time 
(2) next packet, the first data packet was fragmented 
(3) enable rps


[tcp dump]
No.     A-Time         Source     Destination  Len   Seq  Info 
 1  08:35:18.115259  193.81.6.70  10.217.0.47  84     0   [SYN] Seq=0 Win=21504 Len=0 MSS=1460 
 2  08:35:18.115888  10.217.0.47  193.81.6.70  84     0   6100 → 5063 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1460 
 3  08:35:18.142385  193.81.6.70  10.217.0.47  80     1   5063 → 6100 [ACK] Seq=1 Ack=1 Win=21504 Len=0 
 4  08:35:18.142425  193.81.6.70  10.217.0.47  1516       Fragmented IP protocol (proto=Encap Security Payload 50, off=0, ID=6e24) [Reassembled in #5] 
 5  08:35:18.142449  193.81.6.70  10.217.0.47  60     1   5063 → 6100 [ACK] Seq=1 Ack=1 Win=21504 Len=1460 [TCP segment of a reassembled PDU] 
 6  08:35:21.227070  193.81.6.70  10.217.0.47  1516       Fragmented IP protocol (proto=Encap Security Payload 50, off=0, ID=71e9) [Reassembled in #7] 
 7  08:35:21.227191  193.81.6.70  10.217.0.47  60     1   [TCP Retransmission] 5063 → 6100 [ACK] Seq=1 Ack=1 Win=21504 Len=1460 
 8  08:35:21.228822  10.217.0.47  193.81.6.70  80     1   6100 → 5063 [ACK] Seq=1 Ack=1461 Win=32120 Len=0

- last ack packet of handshaking(No.3) and next data packet(No4,5) were arrived with just 40us time gap.


[kernel log]
- stage 1 
<3>[ 1037.669229] I[0:  system_server: 3778] get_rps_cpu: skb(64), check hash value:3412396090 
<3>[ 1037.669261] I[0:  system_server: 3778] get_rps_cpu: skb(1500), check hash value:158575680 
<3>[ 1037.669285] I[0:  system_server: 3778] get_rps_cpu: skb(44), check hash value:158575680 
- stage 2 
<3>[ 1037.669541] I[1: Binder:3778_13: 8391] tcp_v4_rcv: Enter! skb(seq:A93E087B, len:1480) 
<3>[ 1037.669552] I[2:Jit thread pool:12990] tcp_v4_rcv: Enter! skb(seq:A93E087B, len:20) 
<3>[ 1037.669564] I[2:Jit thread pool:12990] tcp_v4_rcv: check sk_state:12 skb(seq:A93E087B, len:20) 
<3>[ 1037.669585] I[2:Jit thread pool:12990] tcp_check_req, Enter!: skb(seq:A93E087B, len:20) 
<3>[ 1037.669612] I[1: Binder:3778_13: 8391] tcp_v4_rcv: check sk_state:12 skb(seq:A93E087B, len:1480) 
<3>[ 1037.669625] I[1: Binder:3778_13: 8391] tcp_check_req, Enter!: skb(seq:A93E087B, len:1480) 
<3>[ 1037.669653] I[2:Jit thread pool:12990] tcp_check_req, skb(seq:A93E087B, len:20), own_req:1 
<3>[ 1037.669668] I[1: Binder:3778_13: 8391] tcp_check_req, skb(seq:A93E087B, len:1480), own_req:0 
<3>[ 1037.669708] I[2:Jit thread pool:12990] tcp_rcv_state_process, Established: skb(seq:A93E087B, len:20) 
<3>[ 1037.669724] I[1: Binder:3778_13: 8391] tcp_v4_rcv: discard_relse skb(seq:A93E087B, len:1480)

- stage 1 
because of the data packet has been fragmented(No.4 & 5), 
it was hashed to another core(cpu1) which was differnet with last ack packet(cpu2), by rps. 
so last ack and data packet handled in different core almost simultaniously, at NEW_SYN_RECV state.

- stage 2, cpu2 
one of them will be treated in tcp_check_req() function a little more earlier, 
then it got the true value for own_req from tcp_v4_syn_recv_sock(), and return valid nsk. 
finally going to ESTABLISHED state.

- stage 2, cpu1 
but another, later one is got the false value for own_req, 
and return null for nsk, because of own_req value is false in inet_csk_complete_hashdance(). 
so earlier packet was handled successfully but later one has gone to discard.

at this time, one of the ack or data packet could be discarded, by schedule timing. (we saw both of them) 
if the ack was discarded, that's ok. 
tcp state goes to ESTABLISHED by piggyback on data packet, and payload will be deliverd to upper layer. 
but if the data packet was discarded, client can't receive the payload it have to. 
this is the problem we faced.


although server retransmitted the dropped packet(No6,7), but it takes few seconds delay. 
since of this problem occured in IMS-Call setup, this is appeared to call connection delay. 
these situation is serious problem in call service.

do you have any report about this or plan to fix it?


best regards,
bae.



-------------------------------------------------------- 
  배 석 진 (Bae Souk-Jin) 
   System R&D Group 2
   Mobile Device Division Telecommunication Business
   SAMSUNG ELECTRONICS CO. LTD

   Mobile : 82-10-2888-2200
   E-mail : soukjin.bae@...sung.com
--------------------------------------------------------

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ