lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 11 Dec 2019 11:43:59 -0800
From:   Eric Dumazet <edumazet@...gle.com>
To:     Neal Cardwell <ncardwell@...gle.com>
Cc:     "David S . Miller" <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Christoph Paasch <cpaasch@...le.com>,
        Jason Baron <jbaron@...mai.com>,
        Yuchung Cheng <ycheng@...gle.com>,
        Soheil Hassas Yeganeh <soheil@...gle.com>
Subject: Re: [PATCH net] tcp: do not send empty skb from tcp_write_xmit()

On Wed, Dec 11, 2019 at 11:39 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Wed, Dec 11, 2019 at 11:17 AM Neal Cardwell <ncardwell@...gle.com> wrote:
> >
> > On Wed, Dec 11, 2019 at 2:34 AM Eric Dumazet <edumazet@...gle.com> wrote:
> > >
> > > Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from
> > > write queue in error cases") in linux-4.14 stable triggered
> > > various bugs. One of them has been fixed in commit ba2ddb43f270
> > > ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but
> > > we still have crashes in some occasions.
> > >
> > > Root-cause is that when tcp_sendmsg() has allocated a fresh
> > > skb and could not append a fragment before being blocked
> > > in sk_stream_wait_memory(), tcp_write_xmit() might be called
> > > and decide to send this fresh and empty skb.
> > >
> > > Sending an empty packet is not only silly, it might have caused
> > > many issues we had in the past with tp->packets_out being
> > > out of sync.
> > >
> > > Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.")
> > > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > > Cc: Christoph Paasch <cpaasch@...le.com>
> > > Cc: Neal Cardwell <ncardwell@...gle.com>
> > > Cc: Jason Baron <jbaron@...mai.com>
> > > ---
> > >  net/ipv4/tcp_output.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > >
> > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > index b184f03d743715ef4b2d166ceae651529be77953..57f434a8e41ffd6bc584cb4d9e87703491a378c1 100644
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -2438,6 +2438,14 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
> > >                 if (tcp_small_queue_check(sk, skb, 0))
> > >                         break;
> > >
> > > +               /* Argh, we hit an empty skb(), presumably a thread
> > > +                * is sleeping in sendmsg()/sk_stream_wait_memory().
> > > +                * We do not want to send a pure-ack packet and have
> > > +                * a strange looking rtx queue with empty packet(s).
> > > +                */
> > > +               if (TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq)
> > > +                       break;
> > > +
> > >                 if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp)))
> > >                         break;
> > >
> > > --
> >
> > Thanks for the fix, Eric!
> >
> > Is there any risk that any current or future bugs that create
> > persistently empty skbs could cause the connection to "freeze", unable
> > to reach the tcp_transmit_skb() call in tcp_write_xmit()?
> >
> > To avoid this risk, would it make sense to delete the empty skb and
> > continue the tcp_write_xmit() transmit loop, rather than breaking out
> > of the loop?
>
> This 'empty' skb must be the last in the queue.
>
> Removing it from the queue would not prevent tcp_write_xmit() from
> breaking the loop.
>
> If we remove it, then we force sendmsg() to re-allocate a fresh skb,
> which seems more work, and would paper around the bugs that would be
> un-noticed.
>
> Another question to ask is if we need to reconsider how we use
> tcp_write_queue_empty() as an indicator
> for 'I  have something in the write queue'
>
> This is obviously wrong if the write queue has a single empty skb.
>
> Maybe we should instead use tp->write_seq != tp->snd_nxt

Also we use skb_queue_len(&sk->sk_write_queue) == 0 in
tcp_sendmsg_locked() and this seems not good.

We could have changed tcp_sendmsg_locked() to leave the empty skb in
the write queue.
( and we can thus remove tcp_remove_empty_skb() completely since it
has caused many problems in stable kernels)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 8a39ee79489192c02385aaadc8d1ae969fb55d23..9ba3294de9cb4e929afdc0e00b01b6b534c84af6
100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1419,7 +1419,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct
msghdr *msg, size_t size)
        sock_zerocopy_put_abort(uarg, true);
        err = sk_stream_error(sk, flags, err);
        /* make sure we wake any epoll edge trigger waiter */
-       if (unlikely(skb_queue_len(&sk->sk_write_queue) == 0 &&
+       if (unlikely(tp->write_seq == tp->snd_nxt &&
                     err == -EAGAIN)) {
                sk->sk_write_space(sk);
                tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ