lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20100812074041.cf62b793.akpm@linux-foundation.org>
Date:	Thu, 12 Aug 2010 07:40:41 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	netdev@...r.kernel.org
Cc:	bugzilla-daemon@...zilla.kernel.org,
	bugme-daemon@...zilla.kernel.org, yuriy@...z.com
Subject: Re: [Bugme-new] [Bug 16568] New: Regression and incompatibility
 with Windows SP2-SP3-Vista TCP stack causing lost connections


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).


On Thu, 12 Aug 2010 08:20:01 GMT bugzilla-daemon@...zilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=16568
> 
>            Summary: Regression and incompatibility with Windows
>                     SP2-SP3-Vista TCP stack causing lost connections
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 2.6.30+
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: IPV4
>         AssignedTo: shemminger@...ux-foundation.org
>         ReportedBy: yuriy@...z.com
>         Regression: No
> 
> 
> Hi.
> I administer about 50 highly-loaded web servers (free CMS hosting) under linux.
> Having on most of them kernel versions between 2.6.24 and 2.6.29 at the
> beginnig of the year, I made TCP sysctls tunings for increasing DDOS and
> different flooding protection (our servers have attacks rather often).
> tcp_tw_recyle=1 was among of them, as many manuals in the net recommend to do
> this and linux documentation does not say anything bad. Having periodic kernel
> panics connected with bugs in ethernet card drivers and ext3 and after founding
> that 2.6.31+ kernels work faster with ext3, I upgraded almost all kernels to
> 2.6.32.8, which was already being tested on several servers for several months. 
> Somewhen after that we began to receive complaints from our users (site owners)
> that they (and their visitors) see very unstable work of their sites. It looked
> like HTTP-connections were just lost in a random way. Not everybody had the
> problem, just a small percent. We tried to find problem with internet providers
> or buggy firewalls, but finally came to conclusion that problem is connected
> with our servers. Analizing situations with lost connections using tcpdump i
> found that client host send packets, BUT LINUX JUST IGNORES THEM, there was
> SYN-packet repeated 3 times with interval of 3 secs, but NO SYN-ACK reply.
> Most problems had users with Windows SP3 (i.e. almost all users with SP3 had
> the problem). I booted one server with old 2.6.24 kernel and found that problem
> dissappeared. Then began look for exact kernel version, that introduced
> incompatibility. Using binary search I compiled several kernels between 2.6.24
> and 2.6.32.8 and found that 2.6.29.6 DO NO have the problem, but 2.6.30 DOES.
> Studing commits made to tcp_input.c and tcp_ipv4.c (which i supposed were
> involved) between that releases I found this one.
>   author    Eric Dumazet <dada1@...mosbay.com>    
>     Wed, 11 Mar 2009 16:23:57 +0000 (09:23 -0700)
>   committer    David S. Miller <davem@...emloft.net>    
>     Wed, 11 Mar 2009 16:23:57 +0000 (09:23 -0700)
>   commit    fc1ad92dfc4e363a055053746552cdb445ba5c57
> 
>   tcp: allow timestamps even if SYN packet has tsval=0
> 
>   Some systems send SYN packets with apparently wrong RFC1323 timestamp
>   option values [timestamp tsval=0 tsecr=0].
>   It might be for security reasons (http://www.secuobs.com/plugs/25220.shtml )
>   Linux TCP stack ignores this option and sends back a SYN+ACK packet
>   without timestamp option, thus many TCP flows cannot use timestamps
>   and lose some benefit of RFC1323.
>   Other operating systems seem to not care about initial tsval value, and let
>   tcp flows to negotiate timestamp option.
> 
>   net/ipv4/tcp_ipv4.c         diff :
> 
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1226,15 +1226,6 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff
> *skb)
>         if (want_cookie && !tmp_opt.saw_tstamp)
>                 tcp_clear_options(&tmp_opt);
> 
> -       if (tmp_opt.saw_tstamp && !tmp_opt.rcv_tsval) {
> -               /* Some OSes (unknown ones, but I see them on web server, which
> -                * contains information interesting only for windows'
> -                * users) do not send their stamp in SYN. It is easy case.
> -                * We simply do not advertise TS support.
> -                */
> -               tmp_opt.saw_tstamp = 0;
> -               tmp_opt.tstamp_ok  = 0;
> -       }
>         tmp_opt.tstamp_ok = tmp_opt.saw_tstamp;
> 
>         tcp_openreq_init(req, &tmp_opt, skb);
> 
> Removing that was not very good. Having analized lost connections from SP3 I
> know that they have timestamps turned on and timestamp value is 0. Here is it:
> 13:39:10.430498 IP 192.168.99.130.3493 > 192.168.99.100.80: S
> 2507911465:2507911465(0) win 65535 <mss 1460,nop,wscale 3,nop,nop,timestamp 0
> 0,nop,nop,sackOK>
>         0x0000:  4500 0040 2bda 4000 8006 86a6 c0a8 6382  E..@+.@.......c.
>         0x0010:  c0a8 6364 0da5 0050 957b b129 0000 0000  ..cd...P.{.)....
>         0x0020:  b002 ffff 992c 0000 0204 05b4 0103 0303  .....,..........
>         0x0030:  0101 080a 0000 0000 0000 0000 0101 0402  ................
> 
> Having above code fragment removed we got tmp_opt.tstamp_ok=1, as i understand.
> But a little later in source code of tcp_ipv4.c read:
>         /* VJ's idea. We save last timestamp seen
>          * from the destination in peer table, when entering
>          * state TIME-WAIT, and check against it before
>          * accepting new connection request.
>          *
>          * If "isn" is not zero, this request hit alive
>          * timewait bucket, so that all the necessary checks
>          * are made in the function processing timewait state.
>          */
>         if (tmp_opt.saw_tstamp &&
>             tcp_death_row.sysctl_tw_recycle &&
>             (dst = inet_csk_route_req(sk, req)) != NULL &&
>             (peer = rt_get_peer((struct rtable *)dst)) != NULL &&
>             peer->v4daddr == saddr) {
>             if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL &&
>                 (s32)(peer->tcp_ts - req->ts_recent) >
>                             TCP_PAWS_WINDOW) {
>                 NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
>                 goto drop_and_release;
>             }
>         }
> which in some way (tmp_opt.saw_tstamp && tcp_death_row.sysctl_tw_recycle are
> true), random way, having not closed time-wait sockets from the pear, leads to
> packet ignorence.
> 
> As for me, i understand, that i should not enable tw_recycle, BUT DOCUMENTATION
> DOES NOT STATE, that enabling it i'll got random and rather often lost of
> connections from some types of popular clients (like Windows).
> Concerning above stated commit, it should include something to prevent above
> condition to become true if tmp_opt.rcv_tsval==0. I'm not sure, but something
> like
>         if (tmp_opt.saw_tstamp &&
> +           tmp_opt.rcv_tsval &&
>             tcp_death_row.sysctl_tw_recycle &&
>             (dst = inet_csk_route_req(sk, req)) != NULL &&
>             (peer = rt_get_peer((struct rtable *)dst)) != NULL &&
> 
> just to not provide regression and strong TCP-stack incompatibility in case
> tw_recycle is enabled.
> Also documentation does not state, that tw_recyle should not be used at all for
> internet servers, because web-clients, which are behind NAT, will have problems
> connected with the same above condition because successive connections from
> different clients (which have common IP) could have incompatible timestamps.
> 
> Sorry if i detracted somebody busy from his work with my unimportant problem.
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ