lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <CADxym3ZiyYK7Vyz05qLv8jOPmNZXXepCsTbZxdkhSQxRx0cdSA@mail.gmail.com> Date: Thu, 18 May 2023 22:11:51 +0800 From: Menglong Dong <menglong8.dong@...il.com> To: Neal Cardwell <ncardwell@...gle.com> Cc: Eric Dumazet <edumazet@...gle.com>, kuba@...nel.org, davem@...emloft.net, pabeni@...hat.com, dsahern@...nel.org, netdev@...r.kernel.org, linux-kernel@...r.kernel.org, Menglong Dong <imagedong@...cent.com>, Yuchung Cheng <ycheng@...gle.com> Subject: Re: [PATCH net-next 3/3] net: tcp: handle window shrink properly On Thu, May 18, 2023 at 9:40 PM Neal Cardwell <ncardwell@...gle.com> wrote: > > On Wed, May 17, 2023 at 10:35 PM Menglong Dong <menglong8.dong@...il.com> wrote: > > > > On Wed, May 17, 2023 at 10:47 PM Eric Dumazet <edumazet@...gle.com> wrote: > > > > > > On Wed, May 17, 2023 at 2:42 PM <menglong8.dong@...il.com> wrote: > > > > > > > > From: Menglong Dong <imagedong@...cent.com> > > > > > > > > Window shrink is not allowed and also not handled for now, but it's > > > > needed in some case. > > > > > > > > In the origin logic, 0 probe is triggered only when there is no any > > > > data in the retrans queue and the receive window can't hold the data > > > > of the 1th packet in the send queue. > > > > > > > > Now, let's change it and trigger the 0 probe in such cases: > > > > > > > > - if the retrans queue has data and the 1th packet in it is not within > > > > the receive window > > > > - no data in the retrans queue and the 1th packet in the send queue is > > > > out of the end of the receive window > > > > > > Sorry, I do not understand. > > > > > > Please provide packetdrill tests for new behavior like that. > > > > > > > Yes. The problem can be reproduced easily. > > > > 1. choose a server machine, decrease it's tcp_mem with: > > echo '1024 1500 2048' > /proc/sys/net/ipv4/tcp_mem > > 2. call listen() and accept() on a port, such as 8888. We call > > accept() looply and without call recv() to make the data stay > > in the receive queue. > > 3. choose a client machine, and create 100 TCP connection > > to the 8888 port of the server. Then, every connection sends > > data about 1M. > > 4. we can see that some of the connection enter the 0-probe > > state, but some of them keep retrans again and again. As > > the server is up to the tcp_mem[2] and skb is dropped before > > the recv_buf full and the connection enter 0-probe state. > > Finially, some of these connection will timeout and break. > > > > With this series, all the 100 connections will enter 0-probe > > status and connection break won't happen. And the data > > trans will recover if we increase tcp_mem or call 'recv()' > > on the sockets in the server. > > > > > Also, such fundamental change would need IETF discussion first. > > > We do not want linux to cause network collapses just because billions > > > of devices send more zero probes. > > > > I think it maybe a good idea to make the connection enter > > 0-probe, rather than drop the skb silently. What 0-probe > > meaning is to wait for space available when the buffer of the > > receive queue is full. And maybe we can also use 0-probe > > when the "buffer" of "TCP protocol" (which means tcp_mem) > > is full? > > > > Am I right? > > > > Thanks! > > Menglong Dong > > Thanks for describing the scenario in more detail. (Some kind of > packetdrill script or other program to reproduce this issue would be > nice, too, as Eric noted.) > > You mention in step (4.) above that some of the connections keep > retransmitting again and again. Are those connections receiving any > ACKs in response to their retransmissions? Perhaps they are receiving > dupacks? Actually, these packets are dropped without any reply, even dupacks. skb will be dropped directly when tcp_try_rmem_schedule() fails in tcp_data_queue(). That's reasonable, as it's useless to reply a ack to the sender, which will cause the sender fast retrans the packet, because we are out of memory now, and retrans can't solve the problem. > If so, then perhaps we could solve this problem without > depending on a violation of the TCP spec (which says the receive > window should not be retracted) in the following way: when a data > sender suffers a retransmission timeout, and retransmits the first > unacknowledged segment, and receives a dupack for SND.UNA instead of > an ACK covering the RTO-retransmitted segment, then the data sender > should estimate that the receiver doesn't have enough memory to buffer > the retransmitted packet. In that case, the data sender should enter > the 0-probe state and repeatedly set the ICSK_TIME_PROBE0 timer to > call tcp_probe_timer(). > > Basically we could try to enhance the sender-side logic to try to > distinguish between two kinds of problems: > > (a) Repeated data packet loss caused by congestion, routing problems, > or connectivity problems. In this case, the data sender uses > ICSK_TIME_RETRANS and tcp_retransmit_timer(), and backs off and only > retries sysctl_tcp_retries2 times before timing out the connection > > (b) A receiver that is repeatedly sending dupacks but not ACKing > retransmitted data because it doesn't have any memory. In this case, > the data sender uses ICSK_TIME_PROBE0 and tcp_probe_timer(), and backs > off but keeps retrying as long as the data sender receives ACKs. > I'm not sure if this is an ideal method, as it may be not rigorous to conclude that the receiver is oom with dupacks. A packet can loss can also cause multi dupacks. Thanks! Menglong Dong > AFAICT that would be another way to reach the happy state you mention: > "all the 100 connections will enter 0-probe status and connection > break won't happen", and we could reach that state without violating > the TCP protocol spec and without requiring changes on the receiver > side (so that this fix could help in scenarios where the > memory-constrained receiver is an older stack without special new > behavior). > > Eric, Yuchung, Menglong: do you think something like that would work? > > neal
Powered by blists - more mailing lists