[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoDbB2if_=h7XSRU9_i2G=xT+fqmxCU-Mhe438PYcqxj-w@mail.gmail.com>
Date: Sat, 18 May 2024 01:41:23 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Neal Cardwell <ncardwell@...gle.com>
Cc: edumazet@...gle.com, dsahern@...nel.org, kuba@...nel.org,
pabeni@...hat.com, davem@...emloft.net, netdev@...r.kernel.org,
Jason Xing <kernelxing@...cent.com>
Subject: Re: [RFC PATCH net-next] tcp: break the limitation of initial receive window
On Fri, May 17, 2024 at 10:42 PM Neal Cardwell <ncardwell@...gle.com> wrote:
>
> On Fri, May 17, 2024 at 4:50 AM Jason Xing <kerneljasonxing@...il.com> wrote:
> >
> > From: Jason Xing <kernelxing@...cent.com>
> >
> > Since in 2018 one commit a337531b942b ("tcp: up initial rmem to 128KB and
> > SYN rwin to around 64KB") limited received window within 65535, most CDN
> > team would not benefit from this change because they cannot have a large
> > window to receive a big packet one time especially in long RTT.
> >
> > According to RFC 7323, it says:
> > "The maximum receive window, and therefore the scale factor, is
> > determined by the maximum receive buffer space."
> >
> > So we can get rid of this 64k limitation and let the window be tunable if
> > the user wants to do it within the control of buffer space. Then many
> > companies, I believe, can have the same behaviour as old days. Besides,
> > there are many papers conducting various interesting experiments which
> > have something to do with this window and show good outputs in some cases.
> >
> > Signed-off-by: Jason Xing <kernelxing@...cent.com>
> > ---
> > net/ipv4/tcp_output.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index 95caf8aaa8be..95618d0e78e4 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -232,7 +232,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
> > if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))
> > (*rcv_wnd) = min(space, MAX_TCP_WINDOW);
> > else
> > - (*rcv_wnd) = min_t(u32, space, U16_MAX);
> > + (*rcv_wnd) = space;
>
> Hmm, has this patch been tested? This doesn't look like it would work.
Hello Neal,
Thanks for the comment.
Sure, I provided such a patch a few months ago which has been tested
in production for the customers.
One example of using a much bigger initial receive window:
client ---window=65535---> server
client <---window=14600---- server
client ---window=175616---> server
Then the client could send more data than before in fewer rtt.
Above is the output of tcpdump.
Oh, I just found a similar case:
https://lore.kernel.org/all/20220213040545.365600-1-tilan7663@gmail.com/
Before this, I always believed I'm not the only one who had such an issue.
>
> Please note that RFC 7323 says in
> https://datatracker.ietf.org/doc/html/rfc7323#section-2.2 :
>
> The window field in a segment where the SYN bit is set (i.e., a <SYN>
> or <SYN,ACK>) MUST NOT be scaled.
>
> Since the receive window field in a SYN is unscaled, that means the
> TCP wire protocol has no way to convey a receive window in the SYN
> that is bigger than 64KBytes.
>
> That is why this code places a limit of U16_MAX on the value here.
>
> If you want to advertise a bigger receive window in the SYN, you'll
No. It's not my original intention.
For SYN packet itself is limited in the __tcp_transmit_skb() as below:
th->window = htons(min(tp->rcv_wnd, 65535U));
> need to define a new TCP option type, and write an IETF Internet Draft
> and/or RFC standardizing the new option.
>
> If you would like to, instead, submit a patch with a comment
> explaining that this U16_MAX limit is inherent in the RFC 7323 wire
> protocol specification, that could make sense.
I quoted from that link:
--------
I'm not trying to make the sender to send more than 64Kib in the
first RTT. The change will only make the sender to send more starting
on the second RTT(after first ack received on the data). Instead of
having the rcv_wnd to grow from 64Kib, the rcv_wnd can start from a
much larger base value.
Without the patch:
RTT: 1, 2,
3, ...
rcv_wnd: 64KiB, 192KiB, 576KiB, ...
With the patch (assume rcv_wnd is set to 512KiB):
RTT: 1, 2,
3, ...
rcv_wnd: 64KiB, 1.536MiB, 4.608MiB, ...
--------
Another quotation from the paper [1] which has been deployed in Yahoo:
"By increasing this ICW, small objects stand to be transferred in
fewer RTTs, which when compounded across all objects on a page can cut
down the total page load time significantly.
...
Luckily, on popular operating systems (except Linux which has a much
smaller receive window), the initial receive window is quite large
(64KB-256KB), which would allow for utilizing a larger ICW"
[1]: https://conferences.sigcomm.org/imc/2011/docs/p569.pdf
Thanks,
Jason
>
> best regards,
> neal
Powered by blists - more mailing lists