[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200131074116.8684-1-sjpark@amazon.com>
Date: Fri, 31 Jan 2020 08:41:16 +0100
From: <sjpark@...zon.com>
To: Eric Dumazet <edumazet@...gle.com>
CC: <sjpark@...zon.com>, David Miller <davem@...emloft.net>,
"Alexei Starovoitov" <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
"Martin KaFai Lau" <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
<andriin@...com>, netdev <netdev@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
<aams@...zon.com>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
<dola@...zon.com>
Subject: Re: Re: Re: Latency spikes occurs from frequent socket connections
On Thu, 30 Jan 2020 09:02:08 -0800 Eric Dumazet <edumazet@...gle.com> wrote:
> On Thu, Jan 30, 2020 at 4:41 AM <sjpark@...zon.com> wrote:
> >
> > On Wed, 29 Jan 2020 09:52:43 -0800 Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > > On Wed, Jan 29, 2020 at 9:14 AM <sjpark@...zon.com> wrote:
> > > >
> > > > Hello,
> > > >
> > > >
> > > > We found races in the kernel code that incur latency spikes. We thus would
> > > > like to share our investigations and hear your opinions.
> > > >
[...]
> > >
> > > I would rather try to fix the issue more generically, without adding
> > > extra lookups as you did, since they might appear
> > > to reduce the race, but not completely fix it.
> > >
> > > For example, the fact that the client side ignores the RST and
> > > retransmits a SYN after one second might be something that should be
> > > fixed.
> >
> > I also agree with this direction. It seems detecting this situation and
> > adjusting the return value of tcp_timeout_init() to a value much lower than the
> > one second would be a straightforward solution. For a test, I modified the
> > function to return 1 (4ms for CONFIG_HZ=250) and confirmed the reproducer be
> > silent. My following question is, how we can detect this situation in kernel?
> > However, I'm unsure how we can distinguish this specific case from other cases,
> > as everything is working as normal according to the TCP protocol.
> >
> > Also, it seems the value is made to be adjustable from the user space using the
> > bpf callback, BPF_SOCK_OPS_TIMEOUT_INIT:
> >
> > BPF_SOCK_OPS_TIMEOUT_INIT, /* Should return SYN-RTO value to use or
> > * -1 if default value should be used
> > */
> >
> > Thus, it sounds like you are suggesting to do the detection and adjustment from
> > user space. Am I understanding your point? If not, please let me know.
> >
>
> No, I was suggesting to implement a mitigation in the kernel :
>
> When in SYN_SENT state, receiving an suspicious ACK should not
> simply trigger a RST.
>
> There are multiple ways maybe to address the issue.
>
> 1) Abort the SYN_SENT state and let user space receive an error to its
> connect() immediately.
>
> 2) Instead of a RST, allow the first SYN retransmit to happen immediately
> (This is kind of a challenge SYN. Kernel already implements challenge acks)
>
> 3) After RST is sent (to hopefully clear the state of the remote),
> schedule a SYN rtx in a few ms,
> instead of ~ one second.
Thank you for this kind comment, Eric! I would prefer the second and third
idea rather than first one. Anyway, I will send a patch soon. Will add a
kselftest for this case, too.
Thanks,
SeongJae Park
[...]
Powered by blists - more mailing lists