[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iK_MORZmk4waXn3vfHYDZfz4AdqfQ5hrEr0Pwo8DMZG4w@mail.gmail.com>
Date: Fri, 5 Mar 2021 06:33:50 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Jonathan Lemon <jonathan.lemon@...il.com>,
Alexander Duyck <alexander.duyck@...il.com>,
David Miller <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>,
kernel-team <kernel-team@...com>, Neil Spring <ntspring@...com>,
Yuchung Cheng <ycheng@...gle.com>
Subject: Re: [PATCH net] net: tcp: don't allocate fast clones for fastopen SYN
On Fri, Mar 5, 2021 at 6:17 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Fri, Mar 5, 2021 at 12:27 AM Jakub Kicinski <kuba@...nel.org> wrote:
> >
> > On Thu, 4 Mar 2021 22:26:58 +0100 Eric Dumazet wrote:
> > > It would be nice if tun driver would have the ability to delay TX
> > > completions by N usecs,
> > > so that packetdrill tests could be used.
> > >
> > > It is probably not too hard to add such a feature.
> >
> > Add an ioctl to turn off the skb_orphan, queue the skbs in tun_do_read()
> > to free them from a timer?
>
> Yes, I cooked a prototype like that a few hours ago before my night
> shift to launch our test suite.
>
> I yet have to add a sane limit to the number of delayed skbs so that
> syzbot does not oom its hosts ;)
>
> >
> > > I was testing this (note how I also removed the tcp_rearm_rto(sk) call)
> > >
> > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > index 6f450e577975c7be9537338c8a4c0673d7d36c4c..9ef92ca55e530f76ad793d7342442c4ec06165f7
> > > 100644
> > > --- a/net/ipv4/tcp_input.c
> > > +++ b/net/ipv4/tcp_input.c
> > > @@ -6471,11 +6471,10 @@ static bool tcp_rcv_fastopen_synack(struct
> > > sock *sk, struct sk_buff *synack,
> > > tcp_fastopen_cache_set(sk, mss, cookie, syn_drop, try_exp);
> > >
> > > if (data) { /* Retransmit unacked data in SYN */
> > > - skb_rbtree_walk_from(data) {
> > > - if (__tcp_retransmit_skb(sk, data, 1))
> > > - break;
> > > - }
> > > - tcp_rearm_rto(sk);
> > > + skb_rbtree_walk_from(data)
> > > + tcp_mark_skb_lost(sk, data);
> > > +
> > > + tcp_xmit_retransmit_queue(sk);
> > > NET_INC_STATS(sock_net(sk),
> > > LINUX_MIB_TCPFASTOPENACTIVEFAIL);
> > > return true;
> >
> > AFAICT this works great now:
>
> Yes but some packetdrill tests fail, I have to check them (sponge link
> for Googlers), but at first glance we have more investigations.
>
> Ran 2003 tests: 1990 passed, 13 failed
> Sponge: http://sponge2/b0d4c652-3173-4837-86ef-5e8cc59730a1
>
> Failures in
> //net/tcp/fastopen/client:ipv4-mapped-ipv6:cookie-disabled-prod-conn
> //net/tcp/fastopen/client:ipv4-mapped-ipv6:syn-data-icmp-unreach-frag-needed
> //net/tcp/fastopen/client:ipv4-mapped-ipv6:syn-data-icmp-unreach-frag-needed-with-seq
> //net/tcp/fastopen/client:ipv4-mapped-ipv6:syn-data-only-syn-acked
> //net/tcp/fastopen/client:ipv4-mapped-ipv6:syn-data-partial-or-over-ack
> //net/tcp/fastopen/client:ipv4:cookie-disabled-prod-conn
> //net/tcp/fastopen/client:ipv4:syn-data-icmp-unreach-frag-needed
> //net/tcp/fastopen/client:ipv4:syn-data-icmp-unreach-frag-needed-with-seq
> //net/tcp/fastopen/client:ipv4:syn-data-only-syn-acked
> //net/tcp/fastopen/client:ipv4:syn-data-partial-or-over-ack
> //net/tcp/fastopen/client:ipv6:cookie-disabled-prod-conn
> //net/tcp/fastopen/client:ipv6:syn-data-only-syn-acked
> //net/tcp/fastopen/client:ipv6:syn-data-partial-or-over-ack
>
> Showing test.log for
> //net/tcp/fastopen/client:ipv4:syn-data-icmp-unreach-frag-needed
>
> syn-data-icmp-unreach-frag-needed.pkt:33: error handling packet: live
> packet field ipv4_total_length: expected: 800 (0x320) vs actual: 1040
> (0x410)
> script packet: 0.090624 . 1:761(760) ack 1
> actual packet: 0.090619 P. 1:1001(1000) ack 1 win 256
>
Actually this might be a good thing : We now resend the whole data in
a TSO packet, now
we call the standard rtx queue mechanism, instead of sending only one MSS
I will look at other test failures today (Friday) before submitting a
patch series officially.
>
> Yes, it looks like Alex patch no longer works
>
> commit c31b70c9968fe9c4194d1b5d06d07596a3b680de
> Author: Alexander Duyck <alexanderduyck@...com>
> Date: Sat Dec 12 12:31:24 2020 -0800
>
> tcp: Add logic to check for SYN w/ data in tcp_simple_retransmit
>
>
>
> >
> > ==> TFO case ret:-16 (0) ca_state:0 skb:ffff8881d3513800!
> > FREED swapper/5 -- skb 0xffff8881d3513800 freed after: 1us
> > -----
> > First:
> > __tcp_retransmit_skb+1
> > tcp_retransmit_skb+18
> > tcp_xmit_retransmit_queue.part.70+339
> > tcp_rcv_state_process+2491
> > tcp_v6_do_rcv+405
> > tcp_v6_rcv+2984
> >
> > Second:
> > __tcp_retransmit_skb+1
> > tcp_retransmit_skb+18
> > tcp_xmit_retransmit_queue.part.70+339
> > tcp_tsq_write.part.71+146
> > tcp_tsq_handler+53
> > tcp_tasklet_func+181
> >
> > sk:0xffff8885adc16f00 skb:ffff8881d3513800 --- 61us acked:1
> >
> >
> > The other case where we hit RTO after __tcp_retransmit_skb() fails is:
> >
> > ==> non-TFO case ret:-11 (0) ca_state:3 skb:ffff8883d71dd400!
> > -----
> > First:
> > __tcp_retransmit_skb+1
> > tcp_retransmit_skb+18
> > tcp_xmit_retransmit_queue.part.70+339
> > tcp_ack+2270
> > tcp_rcv_established+303
> > tcp_v6_do_rcv+190
> >
> > Second:
> > __tcp_retransmit_skb+1
> > tcp_retransmit_skb+18
> > tcp_retransmit_timer+716
> > tcp_write_timer_handler+136
> > tcp_write_timer+141
> > call_timer_fn+43
> >
> > sk:0xffff88801772d340 skb:ffff8883d71dd400 --- 51738us acked:47324
> >
> > Which I believe is this:
> >
> > if (refcount_read(&sk->sk_wmem_alloc) >
> > min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
> > sk->sk_sndbuf))
> > return -EAGAIN;
> >
> > Because __tcp_retransmit_skb() seems to bail before
> > inet6_sk_rebuild_header gets called. Should we arm TSQ here as well?
>
> Yes, or make the limit slightly bigger since we now have the fclone
> more precise thing for standard drivers [1]
>
> if ((refcount_read(&sk->sk_wmem_alloc) >> 1) >
> min_t(u32, sk->sk_wmem_queued, sk->sk_sndbuf))
> return -EAGAIN;
>
> [1] It is probably wise to keep this code because some drivers do call
> skb_orphan() in their ndo_start_xmit()
Powered by blists - more mailing lists