lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 5 Mar 2021 06:33:50 +0100
From:   Eric Dumazet <edumazet@...gle.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Jonathan Lemon <jonathan.lemon@...il.com>,
        Alexander Duyck <alexander.duyck@...il.com>,
        David Miller <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        kernel-team <kernel-team@...com>, Neil Spring <ntspring@...com>,
        Yuchung Cheng <ycheng@...gle.com>
Subject: Re: [PATCH net] net: tcp: don't allocate fast clones for fastopen SYN

On Fri, Mar 5, 2021 at 6:17 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Fri, Mar 5, 2021 at 12:27 AM Jakub Kicinski <kuba@...nel.org> wrote:
> >
> > On Thu, 4 Mar 2021 22:26:58 +0100 Eric Dumazet wrote:
> > > It would be nice if tun driver would have the ability to delay TX
> > > completions by N usecs,
> > > so that packetdrill tests could be used.
> > >
> > > It is probably not too hard to add such a feature.
> >
> > Add an ioctl to turn off the skb_orphan, queue the skbs in tun_do_read()
> > to free them from a timer?
>
> Yes, I  cooked a prototype like that a few hours ago before my night
> shift to launch our test suite.
>
> I yet have to add a sane limit to the number of delayed skbs so that
> syzbot does not oom its hosts ;)
>
> >
> > > I was testing this (note how I also removed the tcp_rearm_rto(sk) call)
> > >
> > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > index 6f450e577975c7be9537338c8a4c0673d7d36c4c..9ef92ca55e530f76ad793d7342442c4ec06165f7
> > > 100644
> > > --- a/net/ipv4/tcp_input.c
> > > +++ b/net/ipv4/tcp_input.c
> > > @@ -6471,11 +6471,10 @@ static bool tcp_rcv_fastopen_synack(struct
> > > sock *sk, struct sk_buff *synack,
> > >         tcp_fastopen_cache_set(sk, mss, cookie, syn_drop, try_exp);
> > >
> > >         if (data) { /* Retransmit unacked data in SYN */
> > > -               skb_rbtree_walk_from(data) {
> > > -                       if (__tcp_retransmit_skb(sk, data, 1))
> > > -                               break;
> > > -               }
> > > -               tcp_rearm_rto(sk);
> > > +               skb_rbtree_walk_from(data)
> > > +                       tcp_mark_skb_lost(sk, data);
> > > +
> > > +               tcp_xmit_retransmit_queue(sk);
> > >                 NET_INC_STATS(sock_net(sk),
> > >                                 LINUX_MIB_TCPFASTOPENACTIVEFAIL);
> > >                 return true;
> >
> > AFAICT this works great now:
>
> Yes but some packetdrill tests fail, I have to check them (sponge link
> for Googlers), but at first glance we have more investigations.
>
> Ran 2003 tests: 1990 passed, 13 failed
> Sponge: http://sponge2/b0d4c652-3173-4837-86ef-5e8cc59730a1
>
> Failures in
> //net/tcp/fastopen/client:ipv4-mapped-ipv6:cookie-disabled-prod-conn
> //net/tcp/fastopen/client:ipv4-mapped-ipv6:syn-data-icmp-unreach-frag-needed
> //net/tcp/fastopen/client:ipv4-mapped-ipv6:syn-data-icmp-unreach-frag-needed-with-seq
> //net/tcp/fastopen/client:ipv4-mapped-ipv6:syn-data-only-syn-acked
> //net/tcp/fastopen/client:ipv4-mapped-ipv6:syn-data-partial-or-over-ack
> //net/tcp/fastopen/client:ipv4:cookie-disabled-prod-conn
> //net/tcp/fastopen/client:ipv4:syn-data-icmp-unreach-frag-needed
> //net/tcp/fastopen/client:ipv4:syn-data-icmp-unreach-frag-needed-with-seq
> //net/tcp/fastopen/client:ipv4:syn-data-only-syn-acked
> //net/tcp/fastopen/client:ipv4:syn-data-partial-or-over-ack
> //net/tcp/fastopen/client:ipv6:cookie-disabled-prod-conn
> //net/tcp/fastopen/client:ipv6:syn-data-only-syn-acked
> //net/tcp/fastopen/client:ipv6:syn-data-partial-or-over-ack
>
> Showing test.log for
> //net/tcp/fastopen/client:ipv4:syn-data-icmp-unreach-frag-needed
>
> syn-data-icmp-unreach-frag-needed.pkt:33: error handling packet: live
> packet field ipv4_total_length: expected: 800 (0x320) vs actual: 1040
> (0x410)
> script packet: 0.090624 . 1:761(760) ack 1
> actual packet: 0.090619 P. 1:1001(1000) ack 1 win 256
>

Actually this might be a good thing : We now resend the whole data in
a TSO packet, now
we call the standard rtx queue mechanism, instead of sending only one MSS

I will look at other test failures today (Friday) before submitting a
patch series officially.

>
> Yes, it looks like Alex patch no longer works
>
> commit c31b70c9968fe9c4194d1b5d06d07596a3b680de
> Author: Alexander Duyck <alexanderduyck@...com>
> Date:   Sat Dec 12 12:31:24 2020 -0800
>
>     tcp: Add logic to check for SYN w/ data in tcp_simple_retransmit
>
>
>
> >
> > ==> TFO case ret:-16 (0) ca_state:0 skb:ffff8881d3513800!
> >   FREED swapper/5 -- skb 0xffff8881d3513800 freed after: 1us
> > -----
> > First:
> >         __tcp_retransmit_skb+1
> >         tcp_retransmit_skb+18
> >         tcp_xmit_retransmit_queue.part.70+339
> >         tcp_rcv_state_process+2491
> >         tcp_v6_do_rcv+405
> >         tcp_v6_rcv+2984
> >
> > Second:
> >         __tcp_retransmit_skb+1
> >         tcp_retransmit_skb+18
> >         tcp_xmit_retransmit_queue.part.70+339
> >         tcp_tsq_write.part.71+146
> >         tcp_tsq_handler+53
> >         tcp_tasklet_func+181
> >
> >  sk:0xffff8885adc16f00 skb:ffff8881d3513800 --- 61us acked:1
> >
> >
> > The other case where we hit RTO after __tcp_retransmit_skb() fails is:
> >
> > ==> non-TFO case ret:-11 (0) ca_state:3 skb:ffff8883d71dd400!
> > -----
> > First:
> >         __tcp_retransmit_skb+1
> >         tcp_retransmit_skb+18
> >         tcp_xmit_retransmit_queue.part.70+339
> >         tcp_ack+2270
> >         tcp_rcv_established+303
> >         tcp_v6_do_rcv+190
> >
> > Second:
> >         __tcp_retransmit_skb+1
> >         tcp_retransmit_skb+18
> >         tcp_retransmit_timer+716
> >         tcp_write_timer_handler+136
> >         tcp_write_timer+141
> >         call_timer_fn+43
> >
> >  sk:0xffff88801772d340 skb:ffff8883d71dd400 --- 51738us acked:47324
> >
> > Which I believe is this:
> >
> >         if (refcount_read(&sk->sk_wmem_alloc) >
> >             min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
> >                   sk->sk_sndbuf))
> >                 return -EAGAIN;
> >
> > Because __tcp_retransmit_skb() seems to bail before
> > inet6_sk_rebuild_header gets called. Should we arm TSQ here as well?
>
> Yes, or make the limit slightly bigger since we now have the fclone
> more precise thing for standard drivers [1]
>
> if ((refcount_read(&sk->sk_wmem_alloc) >> 1) >
>     min_t(u32, sk->sk_wmem_queued,  sk->sk_sndbuf))
>    return -EAGAIN;
>
> [1] It is probably wise to keep this code because some drivers do call
> skb_orphan() in their ndo_start_xmit()

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ