lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <CANn89iJ4b83GKa+LnGUsaTvBwK+eGwaetktOwXfoR9J5b9JbzQ@mail.gmail.com> Date: Tue, 7 Nov 2023 21:26:41 +0100 From: Eric Dumazet <edumazet@...gle.com> To: David Morley <morleyd.kernel@...il.com> Cc: David Miller <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org, David Morley <morleyd@...gle.com>, Yuchung Cheng <ycheng@...gle.com>, Neal Cardwell <ncardwell@...gle.com> Subject: Re: [PATCH net-next] tcp: make the first N SYN RTO backoffs linear On Tue, May 9, 2023 at 8:06 PM David Morley <morleyd.kernel@...il.com> wrote: > > From: David Morley <morleyd@...gle.com> > > Currently the SYN RTO schedule follows an exponential backoff > scheme, which can be unnecessarily conservative in cases where > there are link failures. In such cases, it's better to > aggressively try to retransmit packets, so it takes routers > less time to find a repath with a working link. > > We chose a default value for this sysctl of 4, to follow > the macOS and IOS backoff scheme of 1,1,1,1,1,2,4,8, ... > MacOS and IOS have used this backoff schedule for over > a decade, since before this 2009 IETF presentation > discussed the behavior: > https://www.ietf.org/proceedings/75/slides/tcpm-1.pdf > > This commit makes the SYN RTO schedule start with a number of > linear backoffs given by the following sysctl: > * tcp_syn_linear_timeouts > > This changes the SYN RTO scheme to be: init_rto_val for > tcp_syn_linear_timeouts, exp backoff starting at init_rto_val > > For example if init_rto_val = 1 and tcp_syn_linear_timeouts = 2, our > backoff scheme would be: 1, 1, 1, 2, 4, 8, 16, ... > > Signed-off-by: David Morley <morleyd@...gle.com> > Signed-off-by: Yuchung Cheng <ycheng@...gle.com> > Signed-off-by: Neal Cardwell <ncardwell@...gle.com> > Tested-by: David Morley <morleyd@...gle.com> > Reviewed-by: Eric Dumazet <edumazet@...gle.com> > --- > Documentation/networking/ip-sysctl.rst | 17 ++++++++++++++--- > include/net/netns/ipv4.h | 1 + > net/ipv4/sysctl_net_ipv4.c | 10 ++++++++++ > net/ipv4/tcp_ipv4.c | 1 + > net/ipv4/tcp_timer.c | 17 +++++++++++++---- > 5 files changed, 39 insertions(+), 7 deletions(-) > > diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst > index 6ec06a33688a..3f6d3d5f5626 100644 > --- a/Documentation/networking/ip-sysctl.rst > +++ b/Documentation/networking/ip-sysctl.rst > @@ -881,9 +881,10 @@ tcp_fastopen_key - list of comma separated 32-digit hexadecimal INTEGERs > tcp_syn_retries - INTEGER > Number of times initial SYNs for an active TCP connection attempt > will be retransmitted. Should not be higher than 127. Default value > - is 6, which corresponds to 63seconds till the last retransmission > - with the current initial RTO of 1second. With this the final timeout > - for an active TCP connection attempt will happen after 127seconds. > + is 6, which corresponds to 67seconds (with tcp_syn_linear_timeouts = 4) > + till the last retransmission with the current initial RTO of 1second. > + With this the final timeout for an active TCP connection attempt > + will happen after 131seconds. > > tcp_timestamps - INTEGER > Enable timestamps as defined in RFC1323. > @@ -946,6 +947,16 @@ tcp_pacing_ca_ratio - INTEGER > > Default: 120 > > +tcp_syn_linear_timeouts - INTEGER > + The number of times for an active TCP connection to retransmit SYNs with > + a linear backoff timeout before defaulting to an exponential backoff > + timeout. This has no effect on SYNACK at the passive TCP side. > + > + With an initial RTO of 1 and tcp_syn_linear_timeouts = 4 we would > + expect SYN RTOs to be: 1, 1, 1, 1, 1, 2, 4, ... (4 linear timeouts, > + and the first exponential backoff using 2^0 * initial_RTO). > + Default: 4 > + > tcp_tso_win_divisor - INTEGER > This allows control over what percentage of the congestion window > can be consumed by a single TSO frame. > diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h > index db762e35aca9..a4efb7a2796c 100644 > --- a/include/net/netns/ipv4.h > +++ b/include/net/netns/ipv4.h > @@ -194,6 +194,7 @@ struct netns_ipv4 { > int sysctl_udp_rmem_min; > > u8 sysctl_fib_notify_on_flag_change; > + u8 sysctl_tcp_syn_linear_timeouts; > > #ifdef CONFIG_NET_L3_MASTER_DEV > u8 sysctl_udp_l3mdev_accept; > diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c > index 40fe70fc2015..6ae3345a3bdf 100644 > --- a/net/ipv4/sysctl_net_ipv4.c > +++ b/net/ipv4/sysctl_net_ipv4.c > @@ -34,6 +34,7 @@ static int ip_ttl_min = 1; > static int ip_ttl_max = 255; > static int tcp_syn_retries_min = 1; > static int tcp_syn_retries_max = MAX_TCP_SYNCNT; > +static int tcp_syn_linear_timeouts_max = MAX_TCP_SYNCNT; > static int ip_ping_group_range_min[] = { 0, 0 }; > static int ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX }; > static u32 u32_max_div_HZ = UINT_MAX / HZ; > @@ -1470,6 +1471,15 @@ static struct ctl_table ipv4_net_table[] = { > .extra1 = SYSCTL_ZERO, > .extra2 = &tcp_plb_max_cong_thresh, > }, > + { > + .procname = "tcp_syn_linear_timeouts", > + .data = &init_net.ipv4.sysctl_tcp_syn_linear_timeouts, > + .maxlen = sizeof(u8), > + .mode = 0644, > + .proc_handler = proc_dou8vec_minmax, > + .extra1 = SYSCTL_ZERO, > + .extra2 = &tcp_syn_linear_timeouts_max, > + }, > { } > }; > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index 39bda2b1066e..db24ed8f8509 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -3275,6 +3275,7 @@ static int __net_init tcp_sk_init(struct net *net) > else > net->ipv4.tcp_congestion_control = &tcp_reno; > > + net->ipv4.sysctl_tcp_syn_linear_timeouts = 4; > return 0; > } > > diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c > index b839c2f91292..0d93a2573807 100644 > --- a/net/ipv4/tcp_timer.c > +++ b/net/ipv4/tcp_timer.c > @@ -234,14 +234,19 @@ static int tcp_write_timeout(struct sock *sk) > struct tcp_sock *tp = tcp_sk(sk); > struct net *net = sock_net(sk); > bool expired = false, do_reset; > - int retry_until; > + int retry_until, max_retransmits; > > if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) { > if (icsk->icsk_retransmits) > __dst_negative_advice(sk); > retry_until = icsk->icsk_syn_retries ? : > READ_ONCE(net->ipv4.sysctl_tcp_syn_retries); > - expired = icsk->icsk_retransmits >= retry_until; > + > + max_retransmits = retry_until; > + if (sk->sk_state == TCP_SYN_SENT) > + max_retransmits += READ_ONCE(net->ipv4.sysctl_tcp_syn_linear_timeouts); > + > + expired = icsk->icsk_retransmits >= max_retransmits; > } else { > if (retransmits_timed_out(sk, READ_ONCE(net->ipv4.sysctl_tcp_retries1), 0)) { > /* Black hole detection */ > @@ -577,8 +582,12 @@ void tcp_retransmit_timer(struct sock *sk) > icsk->icsk_retransmits <= TCP_THIN_LINEAR_RETRIES) { > icsk->icsk_backoff = 0; > icsk->icsk_rto = min(__tcp_set_rto(tp), TCP_RTO_MAX); > - } else { > - /* Use normal (exponential) backoff */ > + } else if (sk->sk_state != TCP_SYN_SENT || > + icsk->icsk_backoff > > + READ_ONCE(net->ipv4.sysctl_tcp_syn_linear_timeouts)) { > + /* Use normal (exponential) backoff unless linear timeouts are > + * activated. > + */ Hi David, back to this patch. If sysctl_tcp_syn_linear_timeouts is set to a high value, we could end up with icsk->icsk_backoff > 64 This can generate various overflows later, eg from inet_csk_rto_backoff(), called from tcp_ld_RTO_revert() More generally tcp_ld_RTO_revert() is not aware of linear timeouts. Thank you.
Powered by blists - more mailing lists