lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0UdXiFBW9oDwvsFPe_ZoGveHLGh6RXf55jaL6kOYPEh0Hg@mail.gmail.com>
Date:   Wed, 3 Mar 2021 13:35:53 -0800
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     Jakub Kicinski <kuba@...nel.org>,
        David Miller <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        kernel-team <kernel-team@...com>, Neil Spring <ntspring@...com>
Subject: Re: [PATCH net] net: tcp: don't allocate fast clones for fastopen SYN

On Tue, Mar 2, 2021 at 1:37 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Tue, Mar 2, 2021 at 7:08 AM Jakub Kicinski <kuba@...nel.org> wrote:
> >
> > When receiver does not accept TCP Fast Open it will only ack
> > the SYN, and not the data. We detect this and immediately queue
> > the data for (re)transmission in tcp_rcv_fastopen_synack().
> >
> > In DC networks with very low RTT and without RFS the SYN-ACK
> > may arrive before NIC driver reported Tx completion on
> > the original SYN. In which case skb_still_in_host_queue()
> > returns true and sender will need to wait for the retransmission
> > timer to fire milliseconds later.
> >
> > Revert back to non-fast clone skbs, this way
> > skb_still_in_host_queue() won't prevent the recovery flow
> > from completing.
> >
> > Suggested-by: Eric Dumazet <edumazet@...gle.com>
> > Fixes: 355a901e6cf1 ("tcp: make connect() mem charging friendly")
>
> Hmmm, not sure if this Fixes: tag makes sense.
>
> Really, if we delay TX completions by say 10 ms, other parts of the
> stack will misbehave anyway.
>
> Also, backporting this patch up to linux-3.19 is going to be tricky.
>
> The real issue here is that skb_still_in_host_queue() can give a false positive.
>
> I have mixed feelings here, as you can read my answer :/
>
> Maybe skb_still_in_host_queue() signal should not be used when a part
> of the SKB has been received/acknowledged by the remote peer
> (in this case the SYN part).
>
> Alternative is that drivers unable to TX complete their skbs in a
> reasonable time should call skb_orphan()
>  to avoid skb_unclone() penalties (and this skb_still_in_host_queue() issue)
>
> If you really want to play and delay TX completions, maybe provide a
> way to disable skb_still_in_host_queue() globally,
> using a static key ?

The problem as I see it is that the original fclone isn't what we sent
out on the wire and that is confusing things. What we sent was a SYN
with data, but what we have now is just a data frame that hasn't been
put out on the wire yet.

I wonder if we couldn't get away with doing something like adding a
fourth option of SKB_FCLONE_MODIFIED that we could apply to fastopen
skbs? That would keep the skb_still_in_host queue from triggering as
we would be changing the state from SKB_FCLONE_ORIG to
SKB_FCLONE_MODIFIED for the skb we store in the retransmit queue. In
addition if we have to clone it again and the fclone reference count
is 1 we could reset it back to SKB_FCLONE_ORIG.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ