lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 10 Jul 2018 05:38:55 -0700 From: Eric Dumazet <eric.dumazet@...il.com> To: Jon Maxwell <jmaxwell37@...il.com>, davem@...emloft.net Cc: edumazet@...gle.com, eric.dumazet@...il.com, ncardwell@...gle.com, David.Laight@...lab.com, kuznet@....inr.ac.ru, yoshfuji@...ux-ipv6.org, netdev@...r.kernel.org, linux-kernel@...r.kernel.org, jmaxwell@...hat.com Subject: Re: [net-next,v3] tcp: Improve setsockopt() TCP_USER_TIMEOUT accuracy On 07/09/2018 11:51 PM, Jon Maxwell wrote: > v3 contains the following suggestions by Neal Cardwell: > > 1) Fix up units mismatch regarding msec/jiffies. > 2) Address possiblility of time_remaining being negative. > 3) Add a helper routine tcp_clamp_rto_to_user_timeout() to do the rto > calculation. > 4) Move start_ts logic into helper routine tcp_retrans_stamp() to > validate tcp_sk(sk)->retrans_stamp. > 5) Some u32 declation and return refactoring. > 6) Return 0 instead of false in tcp_retransmit_stamp(), it's not a bool. > > Suggestions by David Laight: > > 1) Don't cache rto in tcp_clamp_rto_to_user_timeout(). > 2) Use conditional operator instead of min_t() in > tcp_clamp_rto_to_user_timeout() > > Changes: > > 1) Call tcp_clamp_rto_to_user_timeout(sk) as an argument to > inet_csk_reset_xmit_timer() to save on rto declaration. > > Every time the TCP retransmission timer fires. It checks to see if there is a > timeout before scheduling the next retransmit timer. The retransmit interval > between each retransmission increases exponentially. The issue is that in order > for the timeout to occur the retransmit timer needs to fire again. If the user > timeout check happens after the 9th retransmit for example. It needs to wait for > the 10th retransmit timer to fire in order to evaluate whether a timeout has > occurred or not. If the interval is large enough then the timeout will be > inaccurate. > > For example with a TCP_USER_TIMEOUT of 10 seconds without patch: > > 1st retransmit: > > 22:25:18.973488 IP host1.49310 > host2.search-agent: Flags [.] > > Last retransmit: > > 22:25:26.205499 IP host1.49310 > host2.search-agent: Flags [.] > > Timeout: > > send: Connection timed out > Sun Jul 1 22:25:34 EDT 2018 > > We can see that last retransmit took ~7 seconds. Which pushed the total > timeout to ~15 seconds instead of the expected 10 seconds. This gets more > inaccurate the larger the TCP_USER_TIMEOUT value. As the interval increases. > > Add tcp_clamp_rto_to_user_timeout() to determine if the user rto has expired. > Or whether the rto interval needs to be recalculated. Use the original interval > if user rto is not set. > > Test results with the patch is the expected 10 second timeout: > > 1st retransmit: > > 01:37:59.022555 IP host1.49310 > host2.search-agent: Flags [.] > > Last retransmit: > > 01:38:06.486558 IP host1.49310 > host2.search-agent: Flags [.] > > Timeout: > > send: Connection timed out > Mon Jul 2 01:38:09 EDT 2018 > > Signed-off-by: Jon Maxwell <jmaxwell37@...il.com> > --- > net/ipv4/tcp_timer.c | 49 +++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 39 insertions(+), 10 deletions(-) > > diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c > index 3b3611729928..93239e58776d 100644 > --- a/net/ipv4/tcp_timer.c > +++ b/net/ipv4/tcp_timer.c > @@ -22,6 +22,38 @@ > #include <linux/gfp.h> > #include <net/tcp.h> > > +u32 tcp_retransmit_stamp(struct sock *sk) const struct sock *sk; (To clearly express the fact this helper does not touch the socket) > +{ > + u32 start_ts = tcp_sk(sk)->retrans_stamp; > + > + if (unlikely(!start_ts)) { > + struct sk_buff *head = tcp_rtx_queue_head(sk); > + > + if (!head) > + return 0; > + start_ts = tcp_skb_timestamp(head); > + } > + return start_ts; > +} > + > +static __u32 tcp_clamp_rto_to_user_timeout(struct sock *sk) const struct sock *sk > +{ > + struct inet_connection_sock *icsk = inet_csk(sk); > + __u32 elapsed, user_timeout; > + u32 start_ts; Mixing __u32 and u32 in new code is confusing. __u32 are needed in uapi include files, not in the C kernel code. u32 elapsed, user_timeout, start_ts; > + > + start_ts = tcp_retransmit_stamp(sk); > + if (!icsk->icsk_user_timeout || !start_ts) > + return icsk->icsk_rto; > + elapsed = tcp_time_stamp(tcp_sk(sk)) - start_ts; > + user_timeout = jiffies_to_msecs(icsk->icsk_user_timeout); Note that if we always do jiffies_to_msecs(icsk->icsk_user_timeout) in TCP, we also could change the convention and store msecs in this field instead of jiffies. That would eliminate the msecs_to_jiffies() and jiffies_to_msecs() dance. (That would be done in a patch of its own, of course) > + if (elapsed >= user_timeout) > + return 1; /* user timeout has passed; fire ASAP */ > + else > + return (icsk->icsk_rto < msecs_to_jiffies(user_timeout - elapsed)) ? > + icsk->icsk_rto : msecs_to_jiffies(user_timeout - elapsed); return min_t(i32, icsk->icsk_rto, msecs_to_jiffies(user_timeout - elapsed)); > +} > + > /** > * tcp_write_err() - close socket and save error info > * @sk: The socket the error has appeared on. > @@ -161,19 +193,15 @@ static bool retransmits_timed_out(struct sock *sk, > unsigned int timeout) > { > const unsigned int rto_base = TCP_RTO_MIN; > - unsigned int linear_backoff_thresh, start_ts; > + unsigned int linear_backoff_thresh; > + u32 start_ts; This seems a gratuitous change, making your patch bigger than necessary. u32 and unsigned int are the same essentially. > > if (!inet_csk(sk)->icsk_retransmits) > return false; > > - start_ts = tcp_sk(sk)->retrans_stamp; > - if (unlikely(!start_ts)) { > - struct sk_buff *head = tcp_rtx_queue_head(sk); > - > - if (!head) > - return false; > - start_ts = tcp_skb_timestamp(head); > - } > + start_ts = tcp_retransmit_stamp(sk); > + if (!start_ts) > + return false; One thing you could do is to make this refactoring in a single patch, and provide a patch series, with small units, making the review easier. > > if (likely(timeout == 0)) { > linear_backoff_thresh = ilog2(TCP_RTO_MAX/rto_base); > @@ -535,7 +563,8 @@ void tcp_retransmit_timer(struct sock *sk) > /* Use normal (exponential) backoff */ > icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX); > } > - inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto, TCP_RTO_MAX); > + inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, > + tcp_clamp_rto_to_user_timeout(sk), TCP_RTO_MAX); > if (retransmits_timed_out(sk, net->ipv4.sysctl_tcp_retries1 + 1, 0)) > __sk_dst_reset(sk); > > Thanks !
Powered by blists - more mailing lists