lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 11 Dec 2019 15:04:34 -0500
From:   Neal Cardwell <ncardwell@...gle.com>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     "David S . Miller" <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Christoph Paasch <cpaasch@...le.com>,
        Jason Baron <jbaron@...mai.com>,
        Yuchung Cheng <ycheng@...gle.com>,
        Soheil Hassas Yeganeh <soheil@...gle.com>
Subject: Re: [PATCH net] tcp: do not send empty skb from tcp_write_xmit()

On Wed, Dec 11, 2019 at 2:44 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Wed, Dec 11, 2019 at 11:39 AM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Wed, Dec 11, 2019 at 11:17 AM Neal Cardwell <ncardwell@...gle.com> wrote:
> > >
> > > On Wed, Dec 11, 2019 at 2:34 AM Eric Dumazet <edumazet@...gle.com> wrote:
> > > >
> > > > Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from
> > > > write queue in error cases") in linux-4.14 stable triggered
> > > > various bugs. One of them has been fixed in commit ba2ddb43f270
> > > > ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but
> > > > we still have crashes in some occasions.
> > > >
> > > > Root-cause is that when tcp_sendmsg() has allocated a fresh
> > > > skb and could not append a fragment before being blocked
> > > > in sk_stream_wait_memory(), tcp_write_xmit() might be called
> > > > and decide to send this fresh and empty skb.
> > > >
> > > > Sending an empty packet is not only silly, it might have caused
> > > > many issues we had in the past with tp->packets_out being
> > > > out of sync.
> > > >
> > > > Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.")
> > > > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > > > Cc: Christoph Paasch <cpaasch@...le.com>
> > > > Cc: Neal Cardwell <ncardwell@...gle.com>
> > > > Cc: Jason Baron <jbaron@...mai.com>
> > > > ---
> > > >  net/ipv4/tcp_output.c | 8 ++++++++
> > > >  1 file changed, 8 insertions(+)
> > > >
> > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > > index b184f03d743715ef4b2d166ceae651529be77953..57f434a8e41ffd6bc584cb4d9e87703491a378c1 100644
> > > > --- a/net/ipv4/tcp_output.c
> > > > +++ b/net/ipv4/tcp_output.c
> > > > @@ -2438,6 +2438,14 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
> > > >                 if (tcp_small_queue_check(sk, skb, 0))
> > > >                         break;
> > > >
> > > > +               /* Argh, we hit an empty skb(), presumably a thread
> > > > +                * is sleeping in sendmsg()/sk_stream_wait_memory().
> > > > +                * We do not want to send a pure-ack packet and have
> > > > +                * a strange looking rtx queue with empty packet(s).
> > > > +                */
> > > > +               if (TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq)
> > > > +                       break;
> > > > +
> > > >                 if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp)))
> > > >                         break;
> > > >
> > > > --
> > >
> > > Thanks for the fix, Eric!
> > >
> > > Is there any risk that any current or future bugs that create
> > > persistently empty skbs could cause the connection to "freeze", unable
> > > to reach the tcp_transmit_skb() call in tcp_write_xmit()?
> > >
> > > To avoid this risk, would it make sense to delete the empty skb and
> > > continue the tcp_write_xmit() transmit loop, rather than breaking out
> > > of the loop?
> >
> > This 'empty' skb must be the last in the queue.
> >
> > Removing it from the queue would not prevent tcp_write_xmit() from
> > breaking the loop.
> >
> > If we remove it, then we force sendmsg() to re-allocate a fresh skb,
> > which seems more work, and would paper around the bugs that would be
> > un-noticed.
> >
> > Another question to ask is if we need to reconsider how we use
> > tcp_write_queue_empty() as an indicator
> > for 'I  have something in the write queue'
> >
> > This is obviously wrong if the write queue has a single empty skb.
> >
> > Maybe we should instead use tp->write_seq != tp->snd_nxt
>
> Also we use skb_queue_len(&sk->sk_write_queue) == 0 in
> tcp_sendmsg_locked() and this seems not good.
>
> We could have changed tcp_sendmsg_locked() to leave the empty skb in
> the write queue.
> ( and we can thus remove tcp_remove_empty_skb() completely since it
> has caused many problems in stable kernels)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 8a39ee79489192c02385aaadc8d1ae969fb55d23..9ba3294de9cb4e929afdc0e00b01b6b534c84af6
> 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1419,7 +1419,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct
> msghdr *msg, size_t size)
>         sock_zerocopy_put_abort(uarg, true);
>         err = sk_stream_error(sk, flags, err);
>         /* make sure we wake any epoll edge trigger waiter */
> -       if (unlikely(skb_queue_len(&sk->sk_write_queue) == 0 &&
> +       if (unlikely(tp->write_seq == tp->snd_nxt &&
>                      err == -EAGAIN)) {
>                 sk->sk_write_space(sk);
>                 tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);

Thanks, Eric.

I like your idea to audit the TCP calls to tcp_write_queue_empty() and
skb_queue_len(&sk->sk_write_queue) to see which ones should be
replaced with comparisons of tp->write_seq and tp->snd_nxt (including
this one). Good catch!

neal

Powered by blists - more mailing lists