lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4CBFE924.2060103@gmail.com>
Date:	Thu, 21 Oct 2010 15:17:56 +0800
From:	Li Yu <raise.sail@...il.com>
To:	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: TCP always advertises zero window.

Hi,

	We found this on RHEL 5.4, the kernel is 2.6.18-164.11.1.el5, and also suspect that recent kernel also has similar problem. 

	First, we turned off both TCP window scaling option and MTU probe feature. On some servers, we found that some servers may always advertised another ends zero received window, below are some captured traffic (by tcpdump -S -nn -vv):

16:24:59.990545 IP (tos 0x0, ttl  64, id 37079, offset 0, flags [DF], proto: TCP (6), length: 52) 10.1.157.3.2904 > 10.1.157.4.2903: ., cksum 0x96df (correct), 3830348746:3830348746(0) ack 1951026211 win 65160 <nop,nop,timestamp 1040455485 1040632013>
16:25:00.054563 IP (tos 0x0, ttl  64, id 47424, offset 0, flags [DF], proto: TCP (6), length: 460) 10.1.157.4.2903 > 10.1.157.3.2904: P 1951026211:1951026619(408) ack 3830348746 win 0 <nop,nop,timestamp 1040632077 1040455485>
16:25:00.054579 IP (tos 0x0, ttl  64, id 37080, offset 0, flags [DF], proto: TCP (6), length: 52) 10.1.157.3.2904 > 10.1.157.4.2903: ., cksum 0x94c7 (correct), 3830348746:3830348746(0) ack 1951026619 win 65160 <nop,nop,timestamp 1040455549 1040632077>
16:25:01.451253 IP (tos 0x0, ttl  64, id 47425, offset 0, flags [DF], proto: TCP (6), length: 4148) 10.1.157.4.2903 > 10.1.157.3.2904: P 1951026619:1951030715(4096) ack 3830348746 win 0 <nop,nop,timestamp 1040633474 1040455549>

	As above example show, 10.1.157.4 always advertise zero window forever. I wrote a small toy kernel module to show TCP internal socket status as below:

tcp-snapshot:sock:
  sk->sk_family=2
  sk->sk_state=1
  sk->sk_reuse=1
  sk->sk_bound_dev_if=0
  atomic_read(&sk->sk_refcnt)=3
  sk->sk_hash=117920776
  sk->sk_shutdown=0
  sk->sk_no_check=0
  sk->sk_userlocks=7
  sk->sk_protocol=6
  sk->sk_type=1
  sk->sk_rcvbuf=131072
  list_empty(&sk->sk_sleep->task_list)=0
  atomic_read(&sk->sk_rmem_alloc)=0
  atomic_read(&sk->sk_wmem_alloc)=0
  atomic_read(&sk->sk_omem_alloc)=0
  sk->sk_receive_queue.qlen=0
  sk->sk_write_queue.qlen=0
  sk->sk_async_wait_queue.qlen=0
  sk->sk_error_queue.qlen=0
  sk->sk_wmem_queued=0
  sk->sk_forward_alloc=8192
  sk->sk_allocation=d0
  sk->sk_sndbuf=131072
  sk->sk_route_caps=1143a9
  sk->sk_gso_type=1
  sk->sk_rcvlowat=1
  sk->sk_flags=300
  sk->sk_lingertime=0
  sk->sk_err=0
  sk->sk_err_soft=0
  sk->sk_ack_backlog=0
  sk->sk_max_ack_backlog=128
  sk->sk_priority=0
  sk->sk_rcvtimeo=9223372036854775807
  sk->sk_sndtimeo=9223372036854775807
  sk->sk_protinfo=0000000000000000
  sk->sk_stamp.tv_sec=18446744073709551615
  sk->sk_stamp.tv_usec=18446744073709551615
  sk->sk_socket=ffff81053ee71080
  sk->sk_user_data=0000000000000000
  sk->sk_sndmsg_page=ffff8103761ab220
  sk->sk_sndmsg_off=475
  sk->sk_send_head=0000000000000000
  sk->sk_write_pending=0
tcp-snapshot:inet_sock:
  inetsk->daddr=39d010a
  inetsk->rcv_saddr=49d010a
  inetsk->dport=580b
  inetsk->num=b57
  inetsk->saddr=49d010a
  inetsk->uc_ttl=4294967295
  inetsk->cmsg_flags=0
  inetsk->opt=0000000000000000
  inetsk->sport=570b
  inetsk->id=5843
  inetsk->tos=0
  inetsk->mc_ttl=64
  inetsk->pmtudisc=1
  inetsk->recverr=0
  inetsk->is_icsk=1
  inetsk->freebind=0
  inetsk->hdrincl=0
  inetsk->mc_loop=1
  inetsk->mc_index=2
  inetsk->mc_addr=0
  inetsk->mc_list=0000000000000000
tcp-snapshot:inet_connection_sk
  icsk->icsk_accept_queue.rskq_defer_accept=0
  icsk->icsk_accept_queue.listen_opt=0000000000000000
  icsk->icsk_timeout=5336784156
  icsk->icsk_rto=218
  icsk->icsk_pmtu_cookie=1500
  icsk->icsk_ca_state=0
  icsk->icsk_retransmits=0
  icsk->icsk_pending=0
  icsk->icsk_backoff=0
  icsk->icsk_syn_retries=0
  icsk->icsk_probes_out=0
  icsk->icsk_ext_hdr_len=0
  icsk->icsk_ack.pending=0
  icsk->icsk_ack.quick=0
  icsk->icsk_ack.pingpong=1
  icsk->icsk_ack.blocked=0
  icsk->icsk_ack.ato=40
  icsk->icsk_ack.timeout=5303454287
  icsk->icsk_ack.lrcvtime=1008486952
  icsk->icsk_ack.last_seg_size=6814
  icsk->icsk_ack.rcv_mss=8688
  icsk->icsk_mtup.enabled=0
  icsk->icsk_mtup.search_high=1500
  icsk->icsk_mtup.search_low=564
  icsk->icsk_mtup.probe_size=0
tcp-snapshot:tcp_sock
  tcpsk->tcp_header_len=32
  tcpsk->pred_flags=0
  tcpsk->rcv_nxt=3830348746
  tcpsk->snd_nxt=1984376345
  tcpsk->snd_una=1984376345
  tcpsk->snd_sml=1984376345
  tcpsk->rcv_tstamp=1041816640
  tcpsk->lsndtime=1041816640
  tcpsk->ucopy.prequeue.qlen=0
  tcpsk->ucopy.task=0000000000000000
  tcpsk->ucopy.iov=0000000000000000
  tcpsk->ucopy.memory=0
  tcpsk->ucopy.len=0
  tcpsk->snd_wl1=3830348746
  tcpsk->snd_wnd=65160
  tcpsk->max_window=65524
  tcpsk->mss_cache=1448
  tcpsk->xmit_size_goal=31856
  tcpsk->window_clamp=65535
  tcpsk->rcv_ssthresh=5792
  tcpsk->frto_highmark=0
  tcpsk->reordering=3
  tcpsk->frto_counter=0
  tcpsk->nonagle=1
  tcpsk->keepalive_probes=0
  tcpsk->srtt=121
  tcpsk->mdev=76
  tcpsk->mdev_max=200
  tcpsk->rttvar=203
  tcpsk->rtt_seq=1984376345
  tcpsk->packets_out=0
  tcpsk->left_out=0
  tcpsk->retrans_out=0
  tcpsk->rx_opt.ts_recent_stamp=1287564284
  tcpsk->rx_opt.ts_recent=1041640111
  tcpsk->rx_opt.rcv_tsval=1041640111
  tcpsk->rx_opt.rcv_tsecr=1041816640
  tcpsk->rx_opt.saw_tstamp=1
  tcpsk->rx_opt.tstamp_ok=1
  tcpsk->rx_opt.dsack=0
  tcpsk->rx_opt.wscale_ok=0
  tcpsk->rx_opt.sack_ok=5
  tcpsk->rx_opt.snd_wscale=0
  tcpsk->rx_opt.rcv_wscale=0
  tcpsk->rx_opt.eff_sacks=0
  tcpsk->rx_opt.num_sacks=0
  tcpsk->rx_opt.user_mss=0
  tcpsk->rx_opt.mss_clamp=1460
  tcpsk->snd_ssthresh=4
  tcpsk->snd_cwnd=4
  tcpsk->snd_cwnd_cnt=4
  tcpsk->snd_cwnd_clamp=65535
  tcpsk->snd_cwnd_used=2
  tcpsk->snd_cwnd_stamp=1041816640
  tcpsk->out_of_order_queue.qlen=0
  tcpsk->rcv_wnd=0
  tcpsk->rcv_wup=3830348746
  tcpsk->write_seq=1984376345
  tcpsk->pushed_seq=1984376345
  tcpsk->copied_seq=3830348746
  tcpsk->duplicate_sack[0].start_seq=3613713418
  tcpsk->duplicate_sack[0].end_seq=3613714866
  tcpsk->selective_acks[i].start_seq=3648234364
  tcpsk->selective_acks[i].end_seq=3648247396
  tcpsk->selective_acks[i].start_seq=3647855528
  tcpsk->selective_acks[i].end_seq=3647856976
  tcpsk->selective_acks[i].start_seq=3640487648
  tcpsk->selective_acks[i].end_seq=3640496336
  tcpsk->selective_acks[i].start_seq=3498843984
  tcpsk->selective_acks[i].end_seq=3498845432
  tcpsk->recv_sack_cache[i].start_seq=1226527628
  tcpsk->recv_sack_cache[i].end_seq=1226549030
  tcpsk->recv_sack_cache[i].start_seq=179088461
  tcpsk->recv_sack_cache[i].end_seq=179091357
  tcpsk->recv_sack_cache[i].start_seq=4042009662
  tcpsk->recv_sack_cache[i].end_seq=4042011110
  tcpsk->recv_sack_cache[i].start_seq=0
  tcpsk->recv_sack_cache[i].end_seq=0
  tcpsk->lost_skb_hint=0000000000000000
  tcpsk->scoreboard_skb_hint=0000000000000000
  tcpsk->retransmit_skb_hint=0000000000000000
  tcpsk->forward_skb_hint=0000000000000000
  tcpsk->fastpath_skb_hint=0000000000000000
  tcpsk->fastpath_cnt_hint=15
  tcpsk->lost_cnt_hint=6
  tcpsk->retransmit_cnt_hint=0
  tcpsk->forward_cnt_hint=9
  tcpsk->advmss=1448
  tcpsk->prior_ssthresh=5
  tcpsk->lost_out=0
  tcpsk->sacked_out=0
  tcpsk->fackets_out=0
  tcpsk->high_seq=1226549030
  tcpsk->retrans_stamp=0
  tcpsk->undo_marker=0
  tcpsk->undo_retrans=1
  tcpsk->urg_seq=0
  tcpsk->urg_data=0
  tcpsk->urg_mode=0
  tcpsk->ecn_flags=0
  tcpsk->snd_up=0
  tcpsk->total_retrans=2110
  tcpsk->bytes_acked=0
  tcpsk->keepalive_time=0
  tcpsk->keepalive_intvl=0
  tcpsk->linger2=0
  tcpsk->last_synq_overflow=0
  tcpsk->rcv_rtt_est.rtt=15
  tcpsk->rcv_rtt_est.seq=3830352454
  tcpsk->rcv_rtt_est.time=1008486951
  tcpsk->rcvq_space.space=468244
  tcpsk->rcvq_space.seq=3830258350
  tcpsk->rcvq_space.time=1008486952
  tcpsk->mtu_probe.probe_seq_start=0
  tcpsk->mtu_probe.probe_seq_end=0

	We noticed that the tcpsk->rcv_wnd indeed is 0, but tcpsk->copied_seq equals with tcpsk->rcv_nxt, and sk->sk_rmem_alloc is 0, both latters mean that there has not any pending data in receive queue.

	After some digging against source code, I found that __tcp_select_window() actually returns zero in such case. In my words, the function should resume window into non-zero value in this time (we have full free space on receive queue), is it right?

	In such case, I think that tcpsk->rcv_ssthresh has an exceptional value, it is too small, which triggers skiping rest processing in __tcp_select_window(), leave zero window forever.

	According to source code, only MTU probing success and receive some non-zero length of L7 payload could grow tcp_sock->rcv_ssthresh. Because of we turn off MTU probing and TCP only received some zero-window probe from another end, so it seem that we have not any chance to update tcp_sock->rcv_ssthresh at all, so a dead loop come here.

	It seem that we lost some processing in skb_data_queue() to keep consistent between free space of receive queue and tcpsk->rcv_ssthresh, is it right? or I missed something or some wrongs in my understanding...


	Thank you~

Yu




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ