lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <8C4F7938-DCA1-46F5-A3C9-7DF62511DEE7@bengler.no>
Date:	Tue, 6 Jan 2015 18:17:26 +0000
From:	Erik Grinaker <erik@...gler.no>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	linux-kernel@...r.kernel.org, Yuchung Cheng <ycheng@...gle.com>,
	netdev <netdev@...r.kernel.org>
Subject: Re: TCP connection issues against Amazon S3


> On 06 Jan 2015, at 17:20, Eric Dumazet <eric.dumazet@...il.com> wrote:
> On Tue, 2015-01-06 at 16:11 +0000, Erik Grinaker wrote:
>>> On 06 Jan 2015, at 16:04, Eric Dumazet <eric.dumazet@...il.com> wrote:
>>> On Tue, 2015-01-06 at 15:14 +0000, Erik Grinaker wrote:
>>>> (CCing Yuchung, as his name comes up in the relevant commits)
>>>> 
>>>> After upgrading from Ubuntu 12.04.5 to 14.04.1 we have begun seeing
>>>> intermittent TCP connection hangs for HTTP image requests against
>>>> Amazon S3. 3-5% of requests will suddenly stall in the middle of the
>>>> transfer before timing out. We see this problem across a range of
>>>> servers, in several data centres and networks, all located in Norway.
>>>> 
>>>> A packet dump [1] shows repeated ACK retransmits for some of the
>>>> requests. Using Ubuntu mainline kernels, we found the problem to have
>>>> been introduced between 3.11.10 and 3.12.0, possibly in
>>>> 0f7cc9a3c2bd89b15720dbf358e9b9e62af27126. The problem is also present
>>>> in 3.18.1. Disabling tcp_window_scaling seems to solve it, but has
>>>> obvious drawbacks for transfer speeds. Other sysctls do not seem to
>>>> affect it.
>>>> 
>>>> I am not sure if this is fundamentally a kernel bug or a network
>>>> issue, but we did not see this problem with older kernels.
>>>> 
>>>> [1] http://abstrakt.bengler.no/tcp-issues-s3.pcap.bz2
>>> 
>>> 
>>> CC netdev
>>> 
>>> This looks like the bug we fixed here :
>>> 
>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=39bb5e62867de82b269b07df900165029b928359
>> 
>> Has that patch gone into a release? Because the problem persists with 3.18.1.
> 
> Patch is in 3.18.1 yes.
> 
> So thats a separate issue. 
> 
> Can you confirm pcap was taken at receiver (195.159.221.106), not sender
> (54.231.136.74) , and on which host is running the 'buggy kernel' ?

Yes, pcap was taken on receiver (195.159.221.106).

> If the sender is broken, changing the kernel on receiver wont help.
> 
> BTW not using sack (on 54.231.132.98) is terrible for performance in
> lossy environments.

It may well be that the sender is broken; however, the sender is Amazon S3, so I do not have any control over it. And in any case, the problem goes away with 3.11.10 on receiver, but persists with 3.12.0 (or later) on receiver, so there must be some change in 3.12.0 which has caused this to trigger.

If you are confident that the problem is with Amazon, I can get in touch with their engineering department.--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ