netdev - Re: [PATCH net] net: tcp: don't allocate fast clones for fastopen SYN

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210304152709.4e91bd8b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date:   Thu, 4 Mar 2021 15:27:09 -0800
From:   Jakub Kicinski <kuba@...nel.org>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     Jonathan Lemon <jonathan.lemon@...il.com>,
        Alexander Duyck <alexander.duyck@...il.com>,
        David Miller <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        kernel-team <kernel-team@...com>, Neil Spring <ntspring@...com>,
        Yuchung Cheng <ycheng@...gle.com>
Subject: Re: [PATCH net] net: tcp: don't allocate fast clones for fastopen
 SYN

On Thu, 4 Mar 2021 22:26:58 +0100 Eric Dumazet wrote:
> It would be nice if tun driver would have the ability to delay TX
> completions by N usecs,
> so that packetdrill tests could be used.
> 
> It is probably not too hard to add such a feature.

Add an ioctl to turn off the skb_orphan, queue the skbs in tun_do_read() 
to free them from a timer?

> I was testing this (note how I also removed the tcp_rearm_rto(sk) call)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 6f450e577975c7be9537338c8a4c0673d7d36c4c..9ef92ca55e530f76ad793d7342442c4ec06165f7
> 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -6471,11 +6471,10 @@ static bool tcp_rcv_fastopen_synack(struct
> sock *sk, struct sk_buff *synack,
>         tcp_fastopen_cache_set(sk, mss, cookie, syn_drop, try_exp);
> 
>         if (data) { /* Retransmit unacked data in SYN */
> -               skb_rbtree_walk_from(data) {
> -                       if (__tcp_retransmit_skb(sk, data, 1))
> -                               break;
> -               }
> -               tcp_rearm_rto(sk);
> +               skb_rbtree_walk_from(data)
> +                       tcp_mark_skb_lost(sk, data);
> +
> +               tcp_xmit_retransmit_queue(sk);
>                 NET_INC_STATS(sock_net(sk),
>                                 LINUX_MIB_TCPFASTOPENACTIVEFAIL);
>                 return true;

AFAICT this works great now:

==> TFO case ret:-16 (0) ca_state:0 skb:ffff8881d3513800!
  FREED swapper/5 -- skb 0xffff8881d3513800 freed after: 1us
-----
First:
        __tcp_retransmit_skb+1
        tcp_retransmit_skb+18
        tcp_xmit_retransmit_queue.part.70+339
        tcp_rcv_state_process+2491
        tcp_v6_do_rcv+405
        tcp_v6_rcv+2984

Second:
        __tcp_retransmit_skb+1
        tcp_retransmit_skb+18
        tcp_xmit_retransmit_queue.part.70+339
        tcp_tsq_write.part.71+146
        tcp_tsq_handler+53
        tcp_tasklet_func+181

 sk:0xffff8885adc16f00 skb:ffff8881d3513800 --- 61us acked:1


The other case where we hit RTO after __tcp_retransmit_skb() fails is:

==> non-TFO case ret:-11 (0) ca_state:3 skb:ffff8883d71dd400!
-----
First:
        __tcp_retransmit_skb+1
        tcp_retransmit_skb+18
        tcp_xmit_retransmit_queue.part.70+339
        tcp_ack+2270
        tcp_rcv_established+303
        tcp_v6_do_rcv+190

Second:
        __tcp_retransmit_skb+1
        tcp_retransmit_skb+18
        tcp_retransmit_timer+716
        tcp_write_timer_handler+136
        tcp_write_timer+141
        call_timer_fn+43

 sk:0xffff88801772d340 skb:ffff8883d71dd400 --- 51738us acked:47324

Which I believe is this:

	if (refcount_read(&sk->sk_wmem_alloc) >
	    min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
		  sk->sk_sndbuf))
		return -EAGAIN;

Because __tcp_retransmit_skb() seems to bail before
inet6_sk_rebuild_header gets called. Should we arm TSQ here as well?