netdev - Re: [PATCH] tcp: allow the initial receive window to be greater than 64KiB

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iLdcy4qbUUNSpLKoegh8+Nc=edC3WshQ=OasKyWJQ256A@mail.gmail.com>
Date:   Sun, 13 Feb 2022 10:57:58 -0800
From:   Eric Dumazet <edumazet@...gle.com>
To:     Tian Lan <Tian.Lan@...sigma.com>
Cc:     Tian Lan <tilan7663@...il.com>, netdev <netdev@...r.kernel.org>,
        Andrew Chester <Andrew.Chester@...sigma.com>
Subject: Re: [PATCH] tcp: allow the initial receive window to be greater than 64KiB

On Sun, Feb 13, 2022 at 10:52 AM Tian Lan <Tian.Lan@...sigma.com> wrote:
>
> > To be clear, if the sender respects the initial window in first RTT , then first ACK it will receive allows a much bigger window (at least 2x),  allowing for standard slow start behavior, doubling CWND at each RTT>
> >
> > linux TCP stack is conservative, and wants a proof of remote peer well behaving before opening the gates.
> >
> > The thing is, we have this issue being discussed every 3 months or so, because some people think the RWIN is never changed or something.
> >
> > Last time, we asked to not change the stack, and instead suggested users tune it using eBPF if they really need to bypass TCP standards.
> >
> > https://lkml.org/lkml/2021/12/22/652
>
> I totally understand that Linux wants to be conservative before opening up the gate and I'm fully support of this idea. I think the current Linux behavior is good for network with low latency, but in an environment with high RTT (i.e 20ms), the rcv_wnd really becomes the bottleneck. It took approximately 6 * RTT on average for 4MiB transfer even with large initial snd_cwnd. I think allowing a larger default rcv_wnd would greatly reduce the number of RTT required for the transfer.
>
> From my understanding, BPF_SOCK_OPS_RWND_INIT was added to the kernel to allow the users to by-pass the default if they choose to. Prior to kernel 4.19, the rcv_wnd set via BPF_SOCK_OPS_RWND_INIT could exceed 64KiB and up to the space. But since then, the initial rwnd would always be limited to the 64KiB. This patch would just make the kernel behave similarly to the kernel prior to 4.19 if rcv_wnd is set by eBPF.
>
> What would you suggest for the application that currently relies on setting a "larger" rcv_wnd via BPF_SOCK_OPS_RWND_INIT, do you think if it is a better idea if the rcv_wnd is set after the connection is established.

I suggest that you do not interpret things as " BPF_SOCK_OPS_RWND_INIT
could exceed 64KiB"  because it can not.

If you really need to send more than 64KB in the first RTT, TCP is not
a proper protocol.

13d3b1ebe287 commit message should have been very clear about the 64K
limitation.