lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 21 May 2024 08:36:21 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: dsahern@...nel.org, kuba@...nel.org, pabeni@...hat.com, 
	davem@...emloft.net, ncardwell@...gle.com, netdev@...r.kernel.org, 
	Jason Xing <kernelxing@...cent.com>
Subject: Re: [RFC PATCH v2 net-next] tcp: remove 64 KByte limit for initial
 tp->rcv_wnd value

Hello Eric,

On Tue, May 21, 2024 at 12:51 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Sat, May 18, 2024 at 4:50 AM Jason Xing <kerneljasonxing@...il.com> wrote:
> >
> > From: Jason Xing <kernelxing@...cent.com>
> >
> > In 2018 commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin
> > to around 64KB") limited the initial value of tp->rcv_wnd to 65535, most
> > CDN team would not benefit from this change because they cannot have a
> > large window to receive a big packet, which will be slowed down especially
> > in long RTT.
> >
> > According to RFC 7323, it says:
> >   "The maximum receive window, and therefore the scale factor, is
> >    determined by the maximum receive buffer space."
>
> This seems not relevant ?  wscale factor is not changed in this patch ?
> tp->rcv_wnd is also not the maximum receive window.

Thanks for your review.

I can remove this part. I was trying to claim I do not break RFC.

>
> >
> > So we can get rid of this 64k limitation and let the window be tunable if
> > the user wants to do it within the control of buffer space. Then many
> > companies, I believe, can have the same behaviour as old days.
>
> Not sure this has ever worked, see below.
>
> Also, the "many companies ..." mention has nothing to do in a changelog.

Oh, I just copied/translated from my initial studies of this rcv_wnd
by reading many papers something like this.

I can also remove this sentence.

>
>
> > Besides,
> > there are many papers conducting various interesting experiments which
> > have something to do with this window and show good outputs in some cases,
> > say, paper [1] in Yahoo! CDN.
>
> I think this changelog is trying hard to sell something, but in
> reality TCP 3WHS nature
> makes your claims wrong.
>
> Instead, you should clearly document that this problem can _not_ be
> solved for both
> active _and_ passive connections.
>
> In the first RTT, a client (active connection) can not send more than
> 64KB, if TCP specs
> are properly applied.

Having a large rcv_wnd if the user can tweak this knob can help
transfer data more rapidly. I'm not referring to the first RTT.

>
> >
> > To avoid future confusion, current change doesn't affect the initial
> > receive window on the wire in a SYN or SYN+ACK packet which are set within
> > 65535 bytes according to RFC 7323 also due to the limit in
> > __tcp_transmit_skb():
> >
> >     th->window      = htons(min(tp->rcv_wnd, 65535U));
> >
> > In one word, __tcp_transmit_skb() already ensures that constraint is
> > respected, no matter how large tp->rcv_wnd is.
> >
> > Let me provide one example if with or without the patch:
> > Before:
> > client   --- SYN: rwindow=65535 ---> server
> > client   <--- SYN+ACK: rwindow=65535 ----  server
> > client   --- ACK: rwindow=65536 ---> server
> > Note: for the last ACK, the calculation is 512 << 7.
> >
> > After:
> > client   --- SYN: rwindow=65535 ---> server
> > client   <--- SYN+ACK: rwindow=65535 ----  server
> > client   --- ACK: rwindow=175232 ---> server
> > Note: I use the following command to make it work:
> > ip route change default via [ip] dev eth0 metric 100 initrwnd 120
> > For the last ACK, the calculation is 1369 << 7.
> >
> > We can pay attention to the last ACK in 3-way shakehand and notice that
> > with the patch applied the window can reach more than 64 KByte.
>
> You carefully avoided mentioning the asymmetry.
> I do not think this is needed in the changelog, because this is adding
> confusion.

What kind of case I've met in production is only about whether we're
capable of sending more data at the same time at the very beginning,
so I care much more about the sending process right now.

>
> >
> > [1]: https://conferences.sigcomm.org/imc/2011/docs/p569.pdf
> >
> > Signed-off-by: Jason Xing <kernelxing@...cent.com>
> > ---
> > v2
> > Link: https://lore.kernel.org/all/20240517085031.18896-1-kerneljasonxing@gmail.com/
> > 1. revise the title and body messages (Neal)
> > ---
> >  net/ipv4/tcp_output.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index 95caf8aaa8be..95618d0e78e4 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -232,7 +232,7 @@ void tcp_select_initial_window(const struct sock *sk, int __space, __u32 mss,
> >         if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_workaround_signed_windows))
> >                 (*rcv_wnd) = min(space, MAX_TCP_WINDOW);
> >         else
> > -               (*rcv_wnd) = min_t(u32, space, U16_MAX);
> > +               (*rcv_wnd) = space;
>
> This is probably breaking some  packetdrill tests, but your change
> might [1] be good,

I'll do some packetdrill tests and get back some information soon.

> especially because it allows DRS behavior to be consistent for large
> MTU (eg MTU 9000) and bigger tcp_rmem[1],
> even without playing with initrwnd attribute.
>
> "ss -temoi " would display after connection setup  rcv_space:89600
> instead of a capped value.
>
> [1] This is hard to say, DRS is full of surprises.

To avoid confusion, I will remove this link and relevant statements.

Here are my opinions in conclusion:
1) this change doesn't break the law, I mean, various RFCs.
2) this change allows us to have the same behaviour as 2018 in this case.
3) this change does some good things to certain cases, especially for
the CDN team.

I'll refine the changelog as far as I can, hoping it will not confuse
the readers.

Thanks,
Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ