[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJ3Bkmjq3Q7Hjk7k7hftkKVF2=xEFXAdZbDvJ7Zwv0jsQ@mail.gmail.com>
Date: Wed, 11 Dec 2019 11:43:59 -0800
From: Eric Dumazet <edumazet@...gle.com>
To: Neal Cardwell <ncardwell@...gle.com>
Cc: "David S . Miller" <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>,
Eric Dumazet <eric.dumazet@...il.com>,
Christoph Paasch <cpaasch@...le.com>,
Jason Baron <jbaron@...mai.com>,
Yuchung Cheng <ycheng@...gle.com>,
Soheil Hassas Yeganeh <soheil@...gle.com>
Subject: Re: [PATCH net] tcp: do not send empty skb from tcp_write_xmit()
On Wed, Dec 11, 2019 at 11:39 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Wed, Dec 11, 2019 at 11:17 AM Neal Cardwell <ncardwell@...gle.com> wrote:
> >
> > On Wed, Dec 11, 2019 at 2:34 AM Eric Dumazet <edumazet@...gle.com> wrote:
> > >
> > > Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from
> > > write queue in error cases") in linux-4.14 stable triggered
> > > various bugs. One of them has been fixed in commit ba2ddb43f270
> > > ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but
> > > we still have crashes in some occasions.
> > >
> > > Root-cause is that when tcp_sendmsg() has allocated a fresh
> > > skb and could not append a fragment before being blocked
> > > in sk_stream_wait_memory(), tcp_write_xmit() might be called
> > > and decide to send this fresh and empty skb.
> > >
> > > Sending an empty packet is not only silly, it might have caused
> > > many issues we had in the past with tp->packets_out being
> > > out of sync.
> > >
> > > Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.")
> > > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > > Cc: Christoph Paasch <cpaasch@...le.com>
> > > Cc: Neal Cardwell <ncardwell@...gle.com>
> > > Cc: Jason Baron <jbaron@...mai.com>
> > > ---
> > > net/ipv4/tcp_output.c | 8 ++++++++
> > > 1 file changed, 8 insertions(+)
> > >
> > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > index b184f03d743715ef4b2d166ceae651529be77953..57f434a8e41ffd6bc584cb4d9e87703491a378c1 100644
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -2438,6 +2438,14 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
> > > if (tcp_small_queue_check(sk, skb, 0))
> > > break;
> > >
> > > + /* Argh, we hit an empty skb(), presumably a thread
> > > + * is sleeping in sendmsg()/sk_stream_wait_memory().
> > > + * We do not want to send a pure-ack packet and have
> > > + * a strange looking rtx queue with empty packet(s).
> > > + */
> > > + if (TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq)
> > > + break;
> > > +
> > > if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp)))
> > > break;
> > >
> > > --
> >
> > Thanks for the fix, Eric!
> >
> > Is there any risk that any current or future bugs that create
> > persistently empty skbs could cause the connection to "freeze", unable
> > to reach the tcp_transmit_skb() call in tcp_write_xmit()?
> >
> > To avoid this risk, would it make sense to delete the empty skb and
> > continue the tcp_write_xmit() transmit loop, rather than breaking out
> > of the loop?
>
> This 'empty' skb must be the last in the queue.
>
> Removing it from the queue would not prevent tcp_write_xmit() from
> breaking the loop.
>
> If we remove it, then we force sendmsg() to re-allocate a fresh skb,
> which seems more work, and would paper around the bugs that would be
> un-noticed.
>
> Another question to ask is if we need to reconsider how we use
> tcp_write_queue_empty() as an indicator
> for 'I have something in the write queue'
>
> This is obviously wrong if the write queue has a single empty skb.
>
> Maybe we should instead use tp->write_seq != tp->snd_nxt
Also we use skb_queue_len(&sk->sk_write_queue) == 0 in
tcp_sendmsg_locked() and this seems not good.
We could have changed tcp_sendmsg_locked() to leave the empty skb in
the write queue.
( and we can thus remove tcp_remove_empty_skb() completely since it
has caused many problems in stable kernels)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 8a39ee79489192c02385aaadc8d1ae969fb55d23..9ba3294de9cb4e929afdc0e00b01b6b534c84af6
100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1419,7 +1419,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct
msghdr *msg, size_t size)
sock_zerocopy_put_abort(uarg, true);
err = sk_stream_error(sk, flags, err);
/* make sure we wake any epoll edge trigger waiter */
- if (unlikely(skb_queue_len(&sk->sk_write_queue) == 0 &&
+ if (unlikely(tp->write_seq == tp->snd_nxt &&
err == -EAGAIN)) {
sk->sk_write_space(sk);
tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
Powered by blists - more mailing lists