[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dd7f3fd1b08a44328d59116cd64f483a@exmbdft6.ad.twosigma.com>
Date: Sun, 13 Feb 2022 18:52:45 +0000
From: Tian Lan <Tian.Lan@...sigma.com>
To: Eric Dumazet <edumazet@...gle.com>
CC: Tian Lan <tilan7663@...il.com>, netdev <netdev@...r.kernel.org>,
"Andrew Chester" <Andrew.Chester@...sigma.com>
Subject: RE: [PATCH] tcp: allow the initial receive window to be greater than
64KiB
> To be clear, if the sender respects the initial window in first RTT , then first ACK it will receive allows a much bigger window (at least 2x), allowing for standard slow start behavior, doubling CWND at each RTT>
>
> linux TCP stack is conservative, and wants a proof of remote peer well behaving before opening the gates.
>
> The thing is, we have this issue being discussed every 3 months or so, because some people think the RWIN is never changed or something.
>
> Last time, we asked to not change the stack, and instead suggested users tune it using eBPF if they really need to bypass TCP standards.
>
> https://lkml.org/lkml/2021/12/22/652
I totally understand that Linux wants to be conservative before opening up the gate and I'm fully support of this idea. I think the current Linux behavior is good for network with low latency, but in an environment with high RTT (i.e 20ms), the rcv_wnd really becomes the bottleneck. It took approximately 6 * RTT on average for 4MiB transfer even with large initial snd_cwnd. I think allowing a larger default rcv_wnd would greatly reduce the number of RTT required for the transfer.
From my understanding, BPF_SOCK_OPS_RWND_INIT was added to the kernel to allow the users to by-pass the default if they choose to. Prior to kernel 4.19, the rcv_wnd set via BPF_SOCK_OPS_RWND_INIT could exceed 64KiB and up to the space. But since then, the initial rwnd would always be limited to the 64KiB. This patch would just make the kernel behave similarly to the kernel prior to 4.19 if rcv_wnd is set by eBPF.
What would you suggest for the application that currently relies on setting a "larger" rcv_wnd via BPF_SOCK_OPS_RWND_INIT, do you think if it is a better idea if the rcv_wnd is set after the connection is established.
Powered by blists - more mailing lists