netdev - Re: [PATCH net] tcp: fix functions of tcp_congestion_ops from being called before initialization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <E61E88C0-B325-4F6C-81B7-0D5FE906C0CC@akamai.com>
Date:	Fri, 29 Jul 2016 21:26:36 +0000
From:	"Li, Ji" <jli@...mai.com>
To:	Florian Westphal <fw@...len.de>
CC:	"davem@...emloft.net" <davem@...emloft.net>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"dborkman@...hat.com" <dborkman@...hat.com>,
	"glenn.judd@...ganstanley.com" <glenn.judd@...ganstanley.com>,
	"stephen@...workplumber.org" <stephen@...workplumber.org>
Subject: Re: [PATCH net] tcp: fix functions of tcp_congestion_ops from being
 called before initialization

Thank you for reply. I don’t think there would be kernel crash. But there must
be some unexpected behaviors caused by calling before initialization. Let’s still
use dctcp as an example.

If SYN loss happens during active open, dctcp_ssthresh() is called to calculate
new ssthresh using uninitialized dctcp_alpha (i.e. 0), instead of using specified
alpha as module parameter. Is this expected?  Another example, when ACK for
SYN is being processed, dctcp_update_alpha() is called with uninitialized 
prior_snd_una (again, 0). It makes local variable acked_bytes be just 
tp->snd_una, which is so wrong and then is used to calculate new alpha. I agree
that alpha will be initialized eventually when .init() gets called. But what is the 
point to invoke those functions with uninitialized parameters at first place?
 
The possible unexpected effect for particular congestion control depends on
how each congestion control algorithm requires their parameters. IMHO, it is
unreasonable and dangerous to call a ca_ops function with their parameters
that are supposed to be initialized as non-zero value.

By “non-established state”, are you asking TCP_SYN_SENT/TCP_SYN_RECV. 
In that case, the patch falls back to tcp_reno_ssthresh() for .ssthresh() if 
uninitialized, and .cong_avoid() will not be called if uninitialized. My 
impression is that init_cwnd should not grow by SYN/ACK or its
acknowledgement in 3WHS according to RFC 3390. Please let
me know if it is wrong. But, if ca_ops functions are really needed to be called
during 3WHS, why don’t we initialize them earlier?


On 7/29/16, 5:09 AM, "Florian Westphal" <fw@...len.de> wrote:

    Li, Ji <jli@...mai.com> wrote:
    > In Linux 3.17 and earlier, tcp_init_congestion_ops (i.e. tcp_reno) is
    > used as the ca_ops during 3WHS, and after 3WHS, ca_ops is assigned as 
    > the default congestion control set by sysctl and immediately its parameters
    > stored in icsk_ca_priv[] are initialized. Commit 55d8694fa82c ("net:
    > tcp: assign tcp cong_ops when tcp sk is created") splits assignment and
    > initialization into two steps: assignment is done before SYN or SYN-ACK
    > is sent out; initialization is done after 3WHS (assume without
    > fastopen). But this can cause out-of-order invocation for ca_ops functions
    > other than .init() during 3WHS, as they could be called before its
    > parameters get initialized. It may cause unexpected behavior for
    > congestion controls, and make troubles for those that need dynamic
    > object allocation, like tcp_cdg etc.
    
    What exactly is the problem?
    Kernel crash?
    
    AFAICS cdg can cope with NULL ca->gradients.
    
    > We used tcp_dctcp as an example to visualize the problem, and set it as
    > default congestion control via sysctl. Three parameters
    > (ca->prior_snd_una, ca->prior_rcv_nxt, ca->dctcp_alpha) were monitored
    > when functions, such as dctcp_update_alpha() and dctcp_ssthresh(), are
    > called during 3WHS. All of three are found to be zero, which is likely
    > impossible if dctcp_init() was called ahead, where those three
    > parameters should be initialized. Some other congestion controls are
    > examined too and the same problem was reproduced.
    
    Why is this a problem?
    
    > diff --git a/include/net/tcp.h b/include/net/tcp.h
    > +{
    > +       if (inet_csk(sk)->icsk_ca_initialized)
    > +               return inet_csk(sk)->icsk_ca_ops->ssthresh(sk);
    > +       else
    > +               return tcp_reno_ssthresh(sk);
    > +}
    > +
    >  /* Enter Loss state. If we detect SACK reneging, forget all SACK information
    >   * and reset tags completely, otherwise preserve SACKs. If receiver
    >   * dropped its ofo queue, we will know this due to reneging detection.
    > @@ -1896,7 +1904,7 @@ void tcp_enter_loss(struct sock *sk)
    >             !after(tp->high_seq, tp->snd_una) ||
    >             (icsk->icsk_ca_state == TCP_CA_Loss && !icsk->icsk_retransmits)) {
    >                 tp->prior_ssthresh = tcp_current_ssthresh(sk);
    > -               tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk);
    > +               tp->snd_ssthresh = tcp_ca_ssthresh(sk);
    >                 tcp_ca_event(sk, CA_EVENT_LOSS);
    >                 tcp_init_undo(tp);
    >         }
    
    Can you explain how we can do loss recovery on a non-established
    connection ....?
    
    > @@ -3335,7 +3343,8 @@ static void tcp_cong_control(struct sock *sk, u32 ack, u32 acked_sacked,
    >         if (tcp_in_cwnd_reduction(sk)) {
    >                 /* Reduce cwnd if state mandates */
    >                 tcp_cwnd_reduction(sk, acked_sacked, flag);
    > -       } else if (tcp_may_raise_cwnd(sk, flag)) {
    > +       } else if (tcp_may_raise_cwnd(sk, flag) &&
    > +                  inet_csk(sk)->icsk_ca_initialized) {
    >                 /* Advance cwnd if state allows */
    >                 tcp_cong_avoid(sk, ack, acked_sacked);
    
    Same here.  How is this called for minisock/sk with non-inited cong ops?
    Once sk moves to TCP_ESTABLISHED congestion ops are supposed to
    be initialized.
    
    If thats not the case then thats a bug and should be fixed rather
    than not calling the cc state machinery any more.