lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <33316bd2-da03-0dbb-bd41-4ff44eb81402@del.bg>
Date:   Mon, 19 Feb 2018 18:17:11 +0200
From:   Teodor Milkov <tm@....bg>
To:     Neal Cardwell <ncardwell@...gle.com>
Cc:     Netdev <netdev@...r.kernel.org>, Yuchung Cheng <ycheng@...gle.com>
Subject: Re: [PATCH net] tcp: restrict F-RTO to work-around broken
 middle-boxes

On 19.02.2018 15:38, Neal Cardwell wrote:
> On Sun, Feb 18, 2018 at 4:02 PM, Teodor Milkov <tm@....bg> wrote:
>> Hello,
>>
>> I've numerous reports from Windows users that after kernel upgrade from 4.9
>> to 4.14 they experienced major slow downs and transfer stalls.
>>
>> After some digging, I found that the slowness starts with this commit:
>>
>>   tcp: extend F-RTO to catch more spurious timeouts (89fe18e44)
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=89fe18e44f7ee5ab1c90d0dff5835acee7751427
>>
>> Which is partially reverted later with this one:
>>
>>   tcp: restrict F-RTO to work-around broken middle-boxes (cc663f4d4)
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cc663f4d4c97b7297fb45135ab23cfd508b35a77
>>
>> But, still, we had stalls until I fully reverted 89fe18e44.
> Thanks for the report. Do you have any other details that might help
> evaluate this issue?

I'm sorry I didn't provide more info. It was long session.

> Any packet traces, by any chance?

I'll try and obtain one.

> Were the affected connections web browsing, videos, file transfer, etc?

First reports were from pop3 users. When we asked them to try file 
transfer, the problem persisted.

It seems the slow down/stalls aren't severe enough to frustrate web 
browsers.

> Were there non-Windows users in this population that did not seem to be
> affected by the stalls?

All reports were from Windows users. I was able to partially reproduce 
the problem only using Windows as well. Linux & Mac OS X are apparently 
immune.

> Was the bottleneck primarily Ethernet, wifi, cellular, cable modem, etc?

In my test case it is 100 Mbit/s long haul MAN (Ethernet, 1 ms) and 
there's 75 Mbit/s shaper on top of it set up by one of out ISPs. Not 
sure what kind of shaper/policer this is.

With 4.4 and 4.9 kernels as well as patched 4.14 I get very steady ~6 
MB/s. Otherwise it's up to 3 MB/s with frequent slow downs bellow 500 
KB/s and an average speed of about 1 MB/s.

Reporting customers were on all kinds of connectivity from cellular to 
cable, reporting regressions as low as 1 MByte/s (with good kernel) down 
to 50 KB/s. I suspect that the higher the rtt, the lower the speed.

> Any middleboxes (firewall, NAT, etc) between the servers and users?

In my test there's Linux statefull firewall, yes. Not sure about other 
reporters.

> Does "stall" mean that the connection permanently froze, or temporarily slowed down but eventually
> recovered?
In most cases it is severe slow down, which eventually recovers. 
Occasionally there were complete freezes, but these are rather rare.

I've deployed 4.14.20 with 89fe18e44 completely reverted and so far 
feedback from customers is positive.

Thank you very much for your attention to this.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ