[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160729120918.GB26237@breakpoint.cc>
Date: Fri, 29 Jul 2016 14:09:18 +0200
From: Florian Westphal <fw@...len.de>
To: "Li, Ji" <jli@...mai.com>
Cc: "davem@...emloft.net" <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"fw@...len.de" <fw@...len.de>,
"dborkman@...hat.com" <dborkman@...hat.com>,
"glenn.judd@...ganstanley.com" <glenn.judd@...ganstanley.com>,
"stephen@...workplumber.org" <stephen@...workplumber.org>
Subject: Re: [PATCH net] tcp: fix functions of tcp_congestion_ops from being
called before initialization
Li, Ji <jli@...mai.com> wrote:
> In Linux 3.17 and earlier, tcp_init_congestion_ops (i.e. tcp_reno) is
> used as the ca_ops during 3WHS, and after 3WHS, ca_ops is assigned as
> the default congestion control set by sysctl and immediately its parameters
> stored in icsk_ca_priv[] are initialized. Commit 55d8694fa82c ("net:
> tcp: assign tcp cong_ops when tcp sk is created") splits assignment and
> initialization into two steps: assignment is done before SYN or SYN-ACK
> is sent out; initialization is done after 3WHS (assume without
> fastopen). But this can cause out-of-order invocation for ca_ops functions
> other than .init() during 3WHS, as they could be called before its
> parameters get initialized. It may cause unexpected behavior for
> congestion controls, and make troubles for those that need dynamic
> object allocation, like tcp_cdg etc.
What exactly is the problem?
Kernel crash?
AFAICS cdg can cope with NULL ca->gradients.
> We used tcp_dctcp as an example to visualize the problem, and set it as
> default congestion control via sysctl. Three parameters
> (ca->prior_snd_una, ca->prior_rcv_nxt, ca->dctcp_alpha) were monitored
> when functions, such as dctcp_update_alpha() and dctcp_ssthresh(), are
> called during 3WHS. All of three are found to be zero, which is likely
> impossible if dctcp_init() was called ahead, where those three
> parameters should be initialized. Some other congestion controls are
> examined too and the same problem was reproduced.
Why is this a problem?
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> +{
> + if (inet_csk(sk)->icsk_ca_initialized)
> + return inet_csk(sk)->icsk_ca_ops->ssthresh(sk);
> + else
> + return tcp_reno_ssthresh(sk);
> +}
> +
> /* Enter Loss state. If we detect SACK reneging, forget all SACK information
> * and reset tags completely, otherwise preserve SACKs. If receiver
> * dropped its ofo queue, we will know this due to reneging detection.
> @@ -1896,7 +1904,7 @@ void tcp_enter_loss(struct sock *sk)
> !after(tp->high_seq, tp->snd_una) ||
> (icsk->icsk_ca_state == TCP_CA_Loss && !icsk->icsk_retransmits)) {
> tp->prior_ssthresh = tcp_current_ssthresh(sk);
> - tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk);
> + tp->snd_ssthresh = tcp_ca_ssthresh(sk);
> tcp_ca_event(sk, CA_EVENT_LOSS);
> tcp_init_undo(tp);
> }
Can you explain how we can do loss recovery on a non-established
connection ....?
> @@ -3335,7 +3343,8 @@ static void tcp_cong_control(struct sock *sk, u32 ack, u32 acked_sacked,
> if (tcp_in_cwnd_reduction(sk)) {
> /* Reduce cwnd if state mandates */
> tcp_cwnd_reduction(sk, acked_sacked, flag);
> - } else if (tcp_may_raise_cwnd(sk, flag)) {
> + } else if (tcp_may_raise_cwnd(sk, flag) &&
> + inet_csk(sk)->icsk_ca_initialized) {
> /* Advance cwnd if state allows */
> tcp_cong_avoid(sk, ack, acked_sacked);
Same here. How is this called for minisock/sk with non-inited cong ops?
Once sk moves to TCP_ESTABLISHED congestion ops are supposed to
be initialized.
If thats not the case then thats a bug and should be fixed rather
than not calling the cc state machinery any more.
Powered by blists - more mailing lists