[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <33316bd2-da03-0dbb-bd41-4ff44eb81402@del.bg>
Date: Mon, 19 Feb 2018 18:17:11 +0200
From: Teodor Milkov <tm@....bg>
To: Neal Cardwell <ncardwell@...gle.com>
Cc: Netdev <netdev@...r.kernel.org>, Yuchung Cheng <ycheng@...gle.com>
Subject: Re: [PATCH net] tcp: restrict F-RTO to work-around broken
middle-boxes
On 19.02.2018 15:38, Neal Cardwell wrote:
> On Sun, Feb 18, 2018 at 4:02 PM, Teodor Milkov <tm@....bg> wrote:
>> Hello,
>>
>> I've numerous reports from Windows users that after kernel upgrade from 4.9
>> to 4.14 they experienced major slow downs and transfer stalls.
>>
>> After some digging, I found that the slowness starts with this commit:
>>
>> tcp: extend F-RTO to catch more spurious timeouts (89fe18e44)
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=89fe18e44f7ee5ab1c90d0dff5835acee7751427
>>
>> Which is partially reverted later with this one:
>>
>> tcp: restrict F-RTO to work-around broken middle-boxes (cc663f4d4)
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cc663f4d4c97b7297fb45135ab23cfd508b35a77
>>
>> But, still, we had stalls until I fully reverted 89fe18e44.
> Thanks for the report. Do you have any other details that might help
> evaluate this issue?
I'm sorry I didn't provide more info. It was long session.
> Any packet traces, by any chance?
I'll try and obtain one.
> Were the affected connections web browsing, videos, file transfer, etc?
First reports were from pop3 users. When we asked them to try file
transfer, the problem persisted.
It seems the slow down/stalls aren't severe enough to frustrate web
browsers.
> Were there non-Windows users in this population that did not seem to be
> affected by the stalls?
All reports were from Windows users. I was able to partially reproduce
the problem only using Windows as well. Linux & Mac OS X are apparently
immune.
> Was the bottleneck primarily Ethernet, wifi, cellular, cable modem, etc?
In my test case it is 100 Mbit/s long haul MAN (Ethernet, 1 ms) and
there's 75 Mbit/s shaper on top of it set up by one of out ISPs. Not
sure what kind of shaper/policer this is.
With 4.4 and 4.9 kernels as well as patched 4.14 I get very steady ~6
MB/s. Otherwise it's up to 3 MB/s with frequent slow downs bellow 500
KB/s and an average speed of about 1 MB/s.
Reporting customers were on all kinds of connectivity from cellular to
cable, reporting regressions as low as 1 MByte/s (with good kernel) down
to 50 KB/s. I suspect that the higher the rtt, the lower the speed.
> Any middleboxes (firewall, NAT, etc) between the servers and users?
In my test there's Linux statefull firewall, yes. Not sure about other
reporters.
> Does "stall" mean that the connection permanently froze, or temporarily slowed down but eventually
> recovered?
In most cases it is severe slow down, which eventually recovers.
Occasionally there were complete freezes, but these are rather rare.
I've deployed 4.14.20 with 89fe18e44 completely reverted and so far
feedback from customers is positive.
Thank you very much for your attention to this.
Powered by blists - more mailing lists