netdev - Re: [PATCH net-next 4/4] tcp: remove limit on initial receive window

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <24573.63409.26608.321427@gargle.gargle.HOWL>
Date:   Tue, 12 Jan 2021 11:25:37 -0800
From:   Heath Caldwell <hcaldwel@...mai.com>
To:     Eric Dumazet <edumazet@...gle.com>
CC:     netdev <netdev@...r.kernel.org>, Yuchung Cheng <ycheng@...gle.com>,
        Josh Hunt <johunt@...mai.com>, Ji Li <jli@...mai.com>
Subject: Re: [PATCH net-next 4/4] tcp: remove limit on initial receive window

On 2021-01-12 18:05 (+0100), Eric Dumazet <edumazet@...gle.com> wrote:
> On Tue, Jan 12, 2021 at 5:02 PM Heath Caldwell <hcaldwel@...mai.com> wrote:
> >
> > On 2021-01-12 09:30 (+0100), Eric Dumazet <edumazet@...gle.com> wrote:
> > > I think the whole patch series is an attempt to badly break TCP stack.
> >
> > Can you explain the concern that you have about how these changes might
> > break the TCP stack?
> >
> > Patches 1 and 3 fix clear bugs.
> 
> Not clear to me at least.
> 
> If they were bug fixes, a FIxes: tag would be provided.

The underlying bugs that are addressed in patches 1 and 3 are present in
1da177e4c3f4 ("Linux-2.6.12-rc2") which looks to be the earliest parent
commit in the repository.  What should I do for a Fixes: tag in this
case?

> You are a first time contributor to linux TCP stack, so better make
> sure your claims are solid.

I fear that I may not have expressed the problems and solutions in a
manner that imparted the ideas well.

Maybe I added too much detail in the description for patch 1, which may
have obscured the problem: val is capped to sysctl_rmem_max *before* it
is doubled (resulting in the possibility for sk_rcvbuf to be set to
2*sysctl_rmem_max, rather than it being capped at sysctl_rmem_max).

Maybe I was not explicit enough in the description for patch 3: space is
expanded into sock_net(sk)->ipv4.sysctl_tcp_rmem[2] and sysctl_rmem_max
without first shrinking them to discount the overhead.

> > Patches 2 and 4 might be arguable, though.
> 
> So we have to pick up whatever pleases us ?

I have been treating all of these changes together because they all kind
of work together to provide a consistent model and configurability for
the initial receive window.

Patches 1 and 3 address bugs.
Patch 2 addresses an inconsistency in how overhead is treated specially
for TCP sockets.
Patch 4 addresses the 64KB limit which has been imposed.

If you think that they should be treated separately, I can separate them
to not be combined into a series.  Though tcp_space_from_win(),
introduced by patch 2, is also used by patch 3.

> > Is you objection primarily about the limit removed by patch 4?
> >
> > > Hint : 64K is really the max allowed by TCP standards. Yes, this is
> > > sad, but this is it.
> >
> > Do you mean the limit imposed by the size of the "Window Size" header
> > field?  This limitation is directly addressed by the check in
> > __tcp_transmit_skb():
> >
> >         if (likely(!(tcb->tcp_flags & TCPHDR_SYN))) {
> >                 th->window      = htons(tcp_select_window(sk));
> >                 tcp_ecn_send(sk, skb, th, tcp_header_size);
> >         } else {
> >                 /* RFC1323: The window in SYN & SYN/ACK segments
> >                  * is never scaled.
> >                  */
> >                 th->window      = htons(min(tp->rcv_wnd, 65535U));
> >         }
> >
> > and checking (and capping it there) allows for the field to not overflow
> > while also not artificially restricting the size of the window which
> > will later be advertised (once window scaling is negotiated).
> >
> > > I will not spend hours of work running  packetdrill tests over your
> > > changes, but I am sure they are now quite broken.
> > >
> > > If you believe auto tuning is broken, fix it properly, without trying
> > > to change all the code so that you can understand it.
> >
> > The removal of the limit specifically addresses the situation where auto
> > tuning cannot work: on the initial burst.
> 
> Which standard allows for a very big initial burst ?
> AFAIK, IW10 got a lot of pushback, and that was only for a burst of ~14600 bytes
> Allowing arbitrarily large windows needs IETF approval.

Should not a TCP implementation be generous in what it accepts?

The removal of the limit is not trying to force an increase of the size
of an initially sent burst, it is only trying to allow for the reception
of an initial burst which may have been sent by another host.

A sender should not rely on the receiver's advertised window to prevent
it from causing congestion.

> >  There is no way to know
> > whether an installation desires to receive a larger first burst unless
> > it is specifically configured - and this limit prevents such
> > configuration.
> >
> > > I strongly advise you read RFC 7323 before doing any changes in TCP
> > > stack, and asking us to spend time reviewing your patches.
> >
> > Can you point out the part of the RFC which would be violated by
> > initially (that is, the first packet after the SYN) advertising a window
> > larger than 64KB?
> 
> We handle gradual increase of rwin based on the behavior of
> applications and senders (skb len/truesize ratio)
> 
> Again if you believe autotuning is broken please fix it, instead of
> throwing it away.

Removing the limit is not an attempt to remove the autotuning, it is an
attempt to provide the ability for a user to increase what a TCP
connection can initially receive.  This can be used to configure where
autotuning *starts*.

> Changing initial RWIN can have subtle differences on how fast DRS algo
> can kick in.
> 
> This kind of change would need to be heavily tested, with millions of
> TCP flows in the wild.
> (ie not in a lab environment)
> 
> Have you done this ? What happens if flows are malicious and sent 100
> bytes per segment ?

Removing the limit would not remove protection from such an attack.  If
there was concern over such an attack at a particular installation, the
default sysctl_tcp_rmem[1] of 128KB could be depended upon (along with
the default tcp_adv_win_scale of 1) to achieve the same behavior that
the limit provides.

Instead, the removal of the limit allows for a user who is not concerned
about such an attack to configure an installation such that the
reception of larger than 64KB initial bursts is possible.

Removing the limit does not impose a choice on the user.  Leaving the
limit in place does.