lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 12 Dec 2020 11:01:15 -0800
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Yuchung Cheng <ycheng@...gle.com>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        David Miller <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Alexey Kuznetsov <kuznet@....inr.ac.ru>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        netdev <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>, Martin Lau <kafai@...com>,
        Kernel Team <kernel-team@...com>
Subject: Re: [net-next PATCH] tcp: Add logic to check for SYN w/ data in tcp_simple_retransmit

On Sat, Dec 12, 2020 at 10:34 AM Yuchung Cheng <ycheng@...gle.com> wrote:
>
> On Fri, Dec 11, 2020 at 5:28 PM Alexander Duyck
> <alexander.duyck@...il.com> wrote:
> >
> > From: Alexander Duyck <alexanderduyck@...com>
> >
> > There are cases where a fastopen SYN may trigger either a ICMP_TOOBIG
> > message in the case of IPv6 or a fragmentation request in the case of
> > IPv4. This results in the socket stalling for a second or more as it does
> > not respond to the message by retransmitting the SYN frame.
> >
> > Normally a SYN frame should not be able to trigger a ICMP_TOOBIG or
> > ICMP_FRAG_NEEDED however in the case of fastopen we can have a frame that
> > makes use of the entire MSS. In the case of fastopen it does, and an
> > additional complication is that the retransmit queue doesn't contain the
> > original frames. As a result when tcp_simple_retransmit is called and
> > walks the list of frames in the queue it may not mark the frames as lost
> > because both the SYN and the data packet each individually are smaller than
> > the MSS size after the adjustment. This results in the socket being stalled
> > until the retransmit timer kicks in and forces the SYN frame out again
> > without the data attached.
> >
> > In order to resolve this we can generate our best estimate for the original
> > packet size by detecting the fastopen SYN frame and then adding the
> > overhead for MAX_TCP_OPTION_SPACE and verifying if the SYN w/ data would
> > have exceeded the MSS. If so we can mark the frame as lost and retransmit
> > it.
> >
> > Signed-off-by: Alexander Duyck <alexanderduyck@...com>
> > ---
> >  net/ipv4/tcp_input.c |   30 +++++++++++++++++++++++++++---
> >  1 file changed, 27 insertions(+), 3 deletions(-)
> >
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index 9e8a6c1aa019..79375b58de84 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -2686,11 +2686,35 @@ static void tcp_mtup_probe_success(struct sock *sk)
> >  void tcp_simple_retransmit(struct sock *sk)
> >  {
> >         const struct inet_connection_sock *icsk = inet_csk(sk);
> > +       struct sk_buff *skb = tcp_rtx_queue_head(sk);
> >         struct tcp_sock *tp = tcp_sk(sk);
> > -       struct sk_buff *skb;
> > -       unsigned int mss = tcp_current_mss(sk);
> > +       unsigned int mss;
> > +
> > +       /* A fastopen SYN request is stored as two separate packets within
> > +        * the retransmit queue, this is done by tcp_send_syn_data().
> > +        * As a result simply checking the MSS of the frames in the queue
> > +        * will not work for the SYN packet. So instead we must make a best
> > +        * effort attempt by validating the data frame with the mss size
> > +        * that would be computed now by tcp_send_syn_data and comparing
> > +        * that against the data frame that would have been included with
> > +        * the SYN.
> > +        */
> > +       if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN && tp->syn_data) {
> > +               struct sk_buff *syn_data = skb_rb_next(skb);
> > +
> > +               mss = tcp_mtu_to_mss(sk, icsk->icsk_pmtu_cookie) +
> > +                     tp->tcp_header_len - sizeof(struct tcphdr) -
> > +                     MAX_TCP_OPTION_SPACE;
> nice comment! The original syn_data mss needs to be inferred which is
> a hassle to get right. my sense is path-mtu issue is enough to warrant
> they are lost.
> I suggest simply mark syn & its data lost if tcp_simple_retransmit is
> called during TFO handshake, i.e.
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 62f7aabc7920..7f0c4f2947eb 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -2864,7 +2864,8 @@ void tcp_simple_retransmit(struct sock *sk)
>         unsigned int mss = tcp_current_mss(sk);
>
>         skb_rbtree_walk(skb, &sk->tcp_rtx_queue) {
> -               if (tcp_skb_seglen(skb) > mss)
> +               if (tcp_skb_seglen(skb) > mss ||
> +                   (tp->syn_data && sk->sk_state == TCP_SYN_SENT))
>                         tcp_mark_skb_lost(sk, skb);
>         }
>
> We have a TFO packetdrill test that verifies my suggested fix should
> trigger an immediate retransmit vs 1s wait.

Okay, I will go that route, although I will still probably make one
minor cleanup. Instead of testing for syn_data and state per packet I
will probably keep the bit where I overwrite mss since it is only used
in the loop. What I can do is switch it from unsigned int to int since
technically tcp_current_mss and tcp_skb_seglen are both a signed int
anyway. Then I can just set mss to -1 in the syn_data && TCP_SYN_SENT
case. That way all of the frames in the ring should fail the check
while only having to add one initial check outside the loop.

Powered by blists - more mailing lists