[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJKH37CMUuq66nERZQoMHFp+yuTe=yqxm1kf+RQ1RfHzw@mail.gmail.com>
Date: Fri, 5 Mar 2021 06:17:21 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Jonathan Lemon <jonathan.lemon@...il.com>,
Alexander Duyck <alexander.duyck@...il.com>,
David Miller <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>,
kernel-team <kernel-team@...com>, Neil Spring <ntspring@...com>,
Yuchung Cheng <ycheng@...gle.com>
Subject: Re: [PATCH net] net: tcp: don't allocate fast clones for fastopen SYN
On Fri, Mar 5, 2021 at 12:27 AM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Thu, 4 Mar 2021 22:26:58 +0100 Eric Dumazet wrote:
> > It would be nice if tun driver would have the ability to delay TX
> > completions by N usecs,
> > so that packetdrill tests could be used.
> >
> > It is probably not too hard to add such a feature.
>
> Add an ioctl to turn off the skb_orphan, queue the skbs in tun_do_read()
> to free them from a timer?
Yes, I cooked a prototype like that a few hours ago before my night
shift to launch our test suite.
I yet have to add a sane limit to the number of delayed skbs so that
syzbot does not oom its hosts ;)
>
> > I was testing this (note how I also removed the tcp_rearm_rto(sk) call)
> >
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index 6f450e577975c7be9537338c8a4c0673d7d36c4c..9ef92ca55e530f76ad793d7342442c4ec06165f7
> > 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -6471,11 +6471,10 @@ static bool tcp_rcv_fastopen_synack(struct
> > sock *sk, struct sk_buff *synack,
> > tcp_fastopen_cache_set(sk, mss, cookie, syn_drop, try_exp);
> >
> > if (data) { /* Retransmit unacked data in SYN */
> > - skb_rbtree_walk_from(data) {
> > - if (__tcp_retransmit_skb(sk, data, 1))
> > - break;
> > - }
> > - tcp_rearm_rto(sk);
> > + skb_rbtree_walk_from(data)
> > + tcp_mark_skb_lost(sk, data);
> > +
> > + tcp_xmit_retransmit_queue(sk);
> > NET_INC_STATS(sock_net(sk),
> > LINUX_MIB_TCPFASTOPENACTIVEFAIL);
> > return true;
>
> AFAICT this works great now:
Yes but some packetdrill tests fail, I have to check them (sponge link
for Googlers), but at first glance we have more investigations.
Ran 2003 tests: 1990 passed, 13 failed
Sponge: http://sponge2/b0d4c652-3173-4837-86ef-5e8cc59730a1
Failures in
//net/tcp/fastopen/client:ipv4-mapped-ipv6:cookie-disabled-prod-conn
//net/tcp/fastopen/client:ipv4-mapped-ipv6:syn-data-icmp-unreach-frag-needed
//net/tcp/fastopen/client:ipv4-mapped-ipv6:syn-data-icmp-unreach-frag-needed-with-seq
//net/tcp/fastopen/client:ipv4-mapped-ipv6:syn-data-only-syn-acked
//net/tcp/fastopen/client:ipv4-mapped-ipv6:syn-data-partial-or-over-ack
//net/tcp/fastopen/client:ipv4:cookie-disabled-prod-conn
//net/tcp/fastopen/client:ipv4:syn-data-icmp-unreach-frag-needed
//net/tcp/fastopen/client:ipv4:syn-data-icmp-unreach-frag-needed-with-seq
//net/tcp/fastopen/client:ipv4:syn-data-only-syn-acked
//net/tcp/fastopen/client:ipv4:syn-data-partial-or-over-ack
//net/tcp/fastopen/client:ipv6:cookie-disabled-prod-conn
//net/tcp/fastopen/client:ipv6:syn-data-only-syn-acked
//net/tcp/fastopen/client:ipv6:syn-data-partial-or-over-ack
Showing test.log for
//net/tcp/fastopen/client:ipv4:syn-data-icmp-unreach-frag-needed
syn-data-icmp-unreach-frag-needed.pkt:33: error handling packet: live
packet field ipv4_total_length: expected: 800 (0x320) vs actual: 1040
(0x410)
script packet: 0.090624 . 1:761(760) ack 1
actual packet: 0.090619 P. 1:1001(1000) ack 1 win 256
Yes, it looks like Alex patch no longer works
commit c31b70c9968fe9c4194d1b5d06d07596a3b680de
Author: Alexander Duyck <alexanderduyck@...com>
Date: Sat Dec 12 12:31:24 2020 -0800
tcp: Add logic to check for SYN w/ data in tcp_simple_retransmit
>
> ==> TFO case ret:-16 (0) ca_state:0 skb:ffff8881d3513800!
> FREED swapper/5 -- skb 0xffff8881d3513800 freed after: 1us
> -----
> First:
> __tcp_retransmit_skb+1
> tcp_retransmit_skb+18
> tcp_xmit_retransmit_queue.part.70+339
> tcp_rcv_state_process+2491
> tcp_v6_do_rcv+405
> tcp_v6_rcv+2984
>
> Second:
> __tcp_retransmit_skb+1
> tcp_retransmit_skb+18
> tcp_xmit_retransmit_queue.part.70+339
> tcp_tsq_write.part.71+146
> tcp_tsq_handler+53
> tcp_tasklet_func+181
>
> sk:0xffff8885adc16f00 skb:ffff8881d3513800 --- 61us acked:1
>
>
> The other case where we hit RTO after __tcp_retransmit_skb() fails is:
>
> ==> non-TFO case ret:-11 (0) ca_state:3 skb:ffff8883d71dd400!
> -----
> First:
> __tcp_retransmit_skb+1
> tcp_retransmit_skb+18
> tcp_xmit_retransmit_queue.part.70+339
> tcp_ack+2270
> tcp_rcv_established+303
> tcp_v6_do_rcv+190
>
> Second:
> __tcp_retransmit_skb+1
> tcp_retransmit_skb+18
> tcp_retransmit_timer+716
> tcp_write_timer_handler+136
> tcp_write_timer+141
> call_timer_fn+43
>
> sk:0xffff88801772d340 skb:ffff8883d71dd400 --- 51738us acked:47324
>
> Which I believe is this:
>
> if (refcount_read(&sk->sk_wmem_alloc) >
> min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
> sk->sk_sndbuf))
> return -EAGAIN;
>
> Because __tcp_retransmit_skb() seems to bail before
> inet6_sk_rebuild_header gets called. Should we arm TSQ here as well?
Yes, or make the limit slightly bigger since we now have the fclone
more precise thing for standard drivers [1]
if ((refcount_read(&sk->sk_wmem_alloc) >> 1) >
min_t(u32, sk->sk_wmem_queued, sk->sk_sndbuf))
return -EAGAIN;
[1] It is probably wise to keep this code because some drivers do call
skb_orphan() in their ndo_start_xmit()
Powered by blists - more mailing lists