netdev - Re: [PATCH net] tcp: fix functions of tcp_congestion_ops from being called before initialization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160729120918.GB26237@breakpoint.cc>
Date:	Fri, 29 Jul 2016 14:09:18 +0200
From:	Florian Westphal <fw@...len.de>
To:	"Li, Ji" <jli@...mai.com>
Cc:	"davem@...emloft.net" <davem@...emloft.net>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"fw@...len.de" <fw@...len.de>,
	"dborkman@...hat.com" <dborkman@...hat.com>,
	"glenn.judd@...ganstanley.com" <glenn.judd@...ganstanley.com>,
	"stephen@...workplumber.org" <stephen@...workplumber.org>
Subject: Re: [PATCH net] tcp: fix functions of tcp_congestion_ops from being
 called before initialization

Li, Ji <jli@...mai.com> wrote:
> In Linux 3.17 and earlier, tcp_init_congestion_ops (i.e. tcp_reno) is
> used as the ca_ops during 3WHS, and after 3WHS, ca_ops is assigned as 
> the default congestion control set by sysctl and immediately its parameters
> stored in icsk_ca_priv[] are initialized. Commit 55d8694fa82c ("net:
> tcp: assign tcp cong_ops when tcp sk is created") splits assignment and
> initialization into two steps: assignment is done before SYN or SYN-ACK
> is sent out; initialization is done after 3WHS (assume without
> fastopen). But this can cause out-of-order invocation for ca_ops functions
> other than .init() during 3WHS, as they could be called before its
> parameters get initialized. It may cause unexpected behavior for
> congestion controls, and make troubles for those that need dynamic
> object allocation, like tcp_cdg etc.

What exactly is the problem?
Kernel crash?

AFAICS cdg can cope with NULL ca->gradients.

> We used tcp_dctcp as an example to visualize the problem, and set it as
> default congestion control via sysctl. Three parameters
> (ca->prior_snd_una, ca->prior_rcv_nxt, ca->dctcp_alpha) were monitored
> when functions, such as dctcp_update_alpha() and dctcp_ssthresh(), are
> called during 3WHS. All of three are found to be zero, which is likely
> impossible if dctcp_init() was called ahead, where those three
> parameters should be initialized. Some other congestion controls are
> examined too and the same problem was reproduced.

Why is this a problem?

> diff --git a/include/net/tcp.h b/include/net/tcp.h
> +{
> +       if (inet_csk(sk)->icsk_ca_initialized)
> +               return inet_csk(sk)->icsk_ca_ops->ssthresh(sk);
> +       else
> +               return tcp_reno_ssthresh(sk);
> +}
> +
>  /* Enter Loss state. If we detect SACK reneging, forget all SACK information
>   * and reset tags completely, otherwise preserve SACKs. If receiver
>   * dropped its ofo queue, we will know this due to reneging detection.
> @@ -1896,7 +1904,7 @@ void tcp_enter_loss(struct sock *sk)
>             !after(tp->high_seq, tp->snd_una) ||
>             (icsk->icsk_ca_state == TCP_CA_Loss && !icsk->icsk_retransmits)) {
>                 tp->prior_ssthresh = tcp_current_ssthresh(sk);
> -               tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk);
> +               tp->snd_ssthresh = tcp_ca_ssthresh(sk);
>                 tcp_ca_event(sk, CA_EVENT_LOSS);
>                 tcp_init_undo(tp);
>         }

Can you explain how we can do loss recovery on a non-established
connection ....?

> @@ -3335,7 +3343,8 @@ static void tcp_cong_control(struct sock *sk, u32 ack, u32 acked_sacked,
>         if (tcp_in_cwnd_reduction(sk)) {
>                 /* Reduce cwnd if state mandates */
>                 tcp_cwnd_reduction(sk, acked_sacked, flag);
> -       } else if (tcp_may_raise_cwnd(sk, flag)) {
> +       } else if (tcp_may_raise_cwnd(sk, flag) &&
> +                  inet_csk(sk)->icsk_ca_initialized) {
>                 /* Advance cwnd if state allows */
>                 tcp_cong_avoid(sk, ack, acked_sacked);

Same here.  How is this called for minisock/sk with non-inited cong ops?
Once sk moves to TCP_ESTABLISHED congestion ops are supposed to
be initialized.

If thats not the case then thats a bug and should be fixed rather
than not calling the cc state machinery any more.