[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1517977874.3715.153.camel@gmail.com>
Date: Tue, 06 Feb 2018 20:31:14 -0800
From: Eric Dumazet <eric.dumazet@...il.com>
To: soukjin.bae@...sung.com,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [Android][Kernel][TCP/IP] report of packet discarding during
tcp handshaking
On Wed, 2018-02-07 at 11:16 +0900, 배석진 wrote:
> Hello,
> this is bae working on samsung elec.
>
Hi Bae, thanks for this detailed report and analysis.
> we have a problem that packet discarded during 3-way handshaking on TCP.
> already looks like that Mr Dumazet try to fix the similar issue on this patch, https://android.googlesource.com/kernel/common/+/5e0724d027f0548511a2165a209572d48fe7a4c8
This patch was fixing a more serious bug, since this was possibly
causing corruptions and kernel crashes.
What you describe is simply a packet being dropped, on a very low
probability.
Which is not a huge deal for TCP since this packet will eventually be
re-transmitted.
Also most TCP stacks/flows use DF bit set to not allow packets being
fragmented...
Anyway see my answer at the end of this mail.
> but we are still facing the another corner case.
>
> it needs preconditions for this problem.
> (1) last ack packet of 3-way handshaking and next packet have been arrived at almost same time
> (2) next packet, the first data packet was fragmented
> (3) enable rps
>
>
> [tcp dump]
> No. A-Time Source Destination Len Seq Info
> 1 08:35:18.115259 193.81.6.70 10.217.0.47 84 0 [SYN] Seq=0 Win=21504 Len=0 MSS=1460
> 2 08:35:18.115888 10.217.0.47 193.81.6.70 84 0 6100 → 5063 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1460
> 3 08:35:18.142385 193.81.6.70 10.217.0.47 80 1 5063 → 6100 [ACK] Seq=1 Ack=1 Win=21504 Len=0
> 4 08:35:18.142425 193.81.6.70 10.217.0.47 1516 Fragmented IP protocol (proto=Encap Security Payload 50, off=0, ID=6e24) [Reassembled in #5]
> 5 08:35:18.142449 193.81.6.70 10.217.0.47 60 1 5063 → 6100 [ACK] Seq=1 Ack=1 Win=21504 Len=1460 [TCP segment of a reassembled PDU]
> 6 08:35:21.227070 193.81.6.70 10.217.0.47 1516 Fragmented IP protocol (proto=Encap Security Payload 50, off=0, ID=71e9) [Reassembled in #7]
> 7 08:35:21.227191 193.81.6.70 10.217.0.47 60 1 [TCP Retransmission] 5063 → 6100 [ACK] Seq=1 Ack=1 Win=21504 Len=1460
> 8 08:35:21.228822 10.217.0.47 193.81.6.70 80 1 6100 → 5063 [ACK] Seq=1 Ack=1461 Win=32120 Len=0
>
> - last ack packet of handshaking(No.3) and next data packet(No4,5) were arrived with just 40us time gap.
>
>
> [kernel log]
> - stage 1
> <3>[ 1037.669229] I[0: system_server: 3778] get_rps_cpu: skb(64), check hash value:3412396090
> <3>[ 1037.669261] I[0: system_server: 3778] get_rps_cpu: skb(1500), check hash value:158575680
> <3>[ 1037.669285] I[0: system_server: 3778] get_rps_cpu: skb(44), check hash value:158575680
> - stage 2
> <3>[ 1037.669541] I[1: Binder:3778_13: 8391] tcp_v4_rcv: Enter! skb(seq:A93E087B, len:1480)
> <3>[ 1037.669552] I[2:Jit thread pool:12990] tcp_v4_rcv: Enter! skb(seq:A93E087B, len:20)
> <3>[ 1037.669564] I[2:Jit thread pool:12990] tcp_v4_rcv: check sk_state:12 skb(seq:A93E087B, len:20)
> <3>[ 1037.669585] I[2:Jit thread pool:12990] tcp_check_req, Enter!: skb(seq:A93E087B, len:20)
> <3>[ 1037.669612] I[1: Binder:3778_13: 8391] tcp_v4_rcv: check sk_state:12 skb(seq:A93E087B, len:1480)
> <3>[ 1037.669625] I[1: Binder:3778_13: 8391] tcp_check_req, Enter!: skb(seq:A93E087B, len:1480)
> <3>[ 1037.669653] I[2:Jit thread pool:12990] tcp_check_req, skb(seq:A93E087B, len:20), own_req:1
> <3>[ 1037.669668] I[1: Binder:3778_13: 8391] tcp_check_req, skb(seq:A93E087B, len:1480), own_req:0
> <3>[ 1037.669708] I[2:Jit thread pool:12990] tcp_rcv_state_process, Established: skb(seq:A93E087B, len:20)
> <3>[ 1037.669724] I[1: Binder:3778_13: 8391] tcp_v4_rcv: discard_relse skb(seq:A93E087B, len:1480)
>
> - stage 1
> because of the data packet has been fragmented(No.4 & 5),
> it was hashed to another core(cpu1) which was differnet with last ack packet(cpu2), by rps.
> so last ack and data packet handled in different core almost simultaniously, at NEW_SYN_RECV state.
>
> - stage 2, cpu2
> one of them will be treated in tcp_check_req() function a little more earlier,
> then it got the true value for own_req from tcp_v4_syn_recv_sock(), and return valid nsk.
> finally going to ESTABLISHED state.
>
> - stage 2, cpu1
> but another, later one is got the false value for own_req,
> and return null for nsk, because of own_req value is false in inet_csk_complete_hashdance().
> so earlier packet was handled successfully but later one has gone to discard.
>
> at this time, one of the ack or data packet could be discarded, by schedule timing. (we saw both of them)
> if the ack was discarded, that's ok.
> tcp state goes to ESTABLISHED by piggyback on data packet, and payload will be deliverd to upper layer.
> but if the data packet was discarded, client can't receive the payload it have to.
> this is the problem we faced.
>
>
> although server retransmitted the dropped packet(No6,7), but it takes few seconds delay.
> since of this problem occured in IMS-Call setup, this is appeared to call connection delay.
> these situation is serious problem in call service.
>
> do you have any report about this or plan to fix it?
I guess the following patch should help, can you test it ?
Thanks !
include/net/tcp.h | 3 ++-
net/ipv4/tcp_input.c | 4 +++-
net/ipv4/tcp_ipv4.c | 9 ++++++++-
net/ipv4/tcp_minisocks.c | 3 ++-
net/ipv6/tcp_ipv6.c | 9 ++++++++-
5 files changed, 23 insertions(+), 5 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 58278669cc5572ec397f028b5888f574764e298a..fe4616978ccf2b048ca1b7ad8976558563798785 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -374,7 +374,8 @@ enum tcp_tw_status tcp_timewait_state_process(struct inet_timewait_sock *tw,
struct sk_buff *skb,
const struct tcphdr *th);
struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
- struct request_sock *req, bool fastopen);
+ struct request_sock *req, bool fastopen,
+ bool *lost_race);
int tcp_child_process(struct sock *parent, struct sock *child,
struct sk_buff *skb);
void tcp_enter_loss(struct sock *sk);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index cfa51cfd2d999342018aee3511d5760a17b1493b..20c68a4765235daec01a020913d4779d6926d203 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5870,10 +5870,12 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
tp->rx_opt.saw_tstamp = 0;
req = tp->fastopen_rsk;
if (req) {
+ bool lost_race;
+
WARN_ON_ONCE(sk->sk_state != TCP_SYN_RECV &&
sk->sk_state != TCP_FIN_WAIT1);
- if (!tcp_check_req(sk, skb, req, true))
+ if (!tcp_check_req(sk, skb, req, true, &lost_race))
goto discard;
}
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 95738aa0d8a60697053d763ba7023be2c62d7a90..d72de3d12b4c960c3bd1ad4c45fc493ee919201c 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1672,6 +1672,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
if (sk->sk_state == TCP_NEW_SYN_RECV) {
struct request_sock *req = inet_reqsk(sk);
struct sock *nsk;
+ bool lost_race;
sk = req->rsk_listener;
if (unlikely(tcp_v4_inbound_md5_hash(sk, skb))) {
@@ -1688,15 +1689,21 @@ int tcp_v4_rcv(struct sk_buff *skb)
*/
sock_hold(sk);
refcounted = true;
+ lost_race = false;
nsk = NULL;
if (!tcp_filter(sk, skb)) {
th = (const struct tcphdr *)skb->data;
iph = ip_hdr(skb);
tcp_v4_fill_cb(skb, iph, th);
- nsk = tcp_check_req(sk, skb, req, false);
+ nsk = tcp_check_req(sk, skb, req, false, &lost_race);
}
if (!nsk) {
reqsk_put(req);
+ if (lost_race) {
+ tcp_v4_restore_cb(skb);
+ sock_put(sk);
+ goto lookup;
+ }
goto discard_and_relse;
}
if (nsk == sk) {
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index a8384b0c11f8fa589e2ed5311899b62c80a269f8..7ebc222854f7d7215850f86456aeb6298659b6fa 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -578,7 +578,7 @@ EXPORT_SYMBOL(tcp_create_openreq_child);
struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
struct request_sock *req,
- bool fastopen)
+ bool fastopen, bool *lost_race)
{
struct tcp_options_received tmp_opt;
struct sock *child;
@@ -785,6 +785,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
sock_rps_save_rxhash(child, skb);
tcp_synack_rtt_meas(child, req);
+ *lost_race = !own_req;
return inet_csk_complete_hashdance(sk, child, req, own_req);
listen_overflow:
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index a1ab29e2ab3bb99e120920186be46d2dd4bbb71e..211570bd524ddc3eddadec0b58c587935ad9a42c 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1451,6 +1451,7 @@ static int tcp_v6_rcv(struct sk_buff *skb)
if (sk->sk_state == TCP_NEW_SYN_RECV) {
struct request_sock *req = inet_reqsk(sk);
struct sock *nsk;
+ bool lost_race;
sk = req->rsk_listener;
if (tcp_v6_inbound_md5_hash(sk, skb)) {
@@ -1464,15 +1465,21 @@ static int tcp_v6_rcv(struct sk_buff *skb)
}
sock_hold(sk);
refcounted = true;
+ lost_race = false;
nsk = NULL;
if (!tcp_filter(sk, skb)) {
th = (const struct tcphdr *)skb->data;
hdr = ipv6_hdr(skb);
tcp_v6_fill_cb(skb, hdr, th);
- nsk = tcp_check_req(sk, skb, req, false);
+ nsk = tcp_check_req(sk, skb, req, false, &lost_race);
}
if (!nsk) {
reqsk_put(req);
+ if (lost_race) {
+ tcp_v6_restore_cb(skb);
+ sock_put(sk);
+ goto lookup;
+ }
goto discard_and_relse;
}
if (nsk == sk) {
Powered by blists - more mailing lists