[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJGy7ktcmfOHiN58KyDDXdu=Y5H-L8qRgj6faFQjo9rwQ@mail.gmail.com>
Date: Tue, 31 Dec 2024 13:34:42 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Yafang Shao <laoar.shao@...il.com>
Cc: davem@...emloft.net, dsahern@...nel.org, kuba@...nel.org,
pabeni@...hat.com, horms@...nel.org, netdev@...r.kernel.org
Subject: Re: [PATCH net-next] net: tcp: Define tcp_listendrop() as noinline
On Tue, Dec 31, 2024 at 12:49 PM Yafang Shao <laoar.shao@...il.com> wrote:
>
> On Tue, Dec 31, 2024 at 5:26 PM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Tue, Dec 31, 2024 at 6:52 AM Yafang Shao <laoar.shao@...il.com> wrote:
> > >
> > > The LINUX_MIB_LISTENDROPS counter can be incremented for several reasons,
> > > such as:
> > > - A full SYN backlog queue
> > > - SYN flood attacks
> > > - Memory exhaustion
> > > - Other resource constraints
> > >
> > > Recently, one of our services experienced frequent ListenDrops. While
> > > attempting to trace the root cause, we discovered that tracing the function
> > > tcp_listendrop() was not straightforward because it is currently inlined.
> > > To overcome this, we had to create a livepatch that defined a non-inlined
> > > version of the function, which we then traced using BPF programs.
> > >
> > > $ grep tcp_listendrop /proc/kallsyms
> > > ffffffffc093fba0 t tcp_listendrop_x [livepatch_tmp]
> > >
> > > Through this process, we eventually determined that the ListenDrops were
> > > caused by SYN attacks.
> > >
> > > Since tcp_listendrop() is not part of the critical path, defining it as
> > > noinline would make it significantly easier to trace and diagnose issues
> > > without requiring workarounds such as livepatching. This function can be
> > > used by kernel modules like smc, so export it with EXPORT_SYMBOL_GPL().
> > >
> > > After that change, the result is as follows,
> > >
> > > $ grep tcp_listendrop /proc/kallsyms
> > > ffffffffb718eaa0 T __pfx_tcp_listendrop
> > > ffffffffb718eab0 T tcp_listendrop
> > > ffffffffb7e636b8 r __ksymtab_tcp_listendrop
> > >
> > > Signed-off-by: Yafang Shao <laoar.shao@...il.com>
> > > --
> >
> > Are we going to accept one patch at a time to un-inline all TCP
> > related functions in the kernel ?
>
> I don't have an in-depth understanding of the TCP/IP stack, so I can't
> consolidate all related TCP functions in the error paths into a single
> patch. I would greatly appreciate it if you could help ensure these
> functions are marked as non-inlined based on your expertise. If you
> don't have time to do it directly, could you provide some guidance?
>
> The inlining of TCP functions in error paths often complicates tracing
> efforts. For instance, we recently encountered issues with the inlined
> tcp_write_err(), although we were fortunate to have an alternative
> tracepoint available: inet_sk_error_report.
>
> Unfortunately, no such alternative exists for tcp_listendrop(), which
> is why I submitted this patch.
>
> >
> > My understanding is that current tools work fine. You may need to upgrade yours.
> >
> > # perf probe -a tcp_listendrop
> > Added new events:
> > probe:tcp_listendrop (on tcp_listendrop)
> > probe:tcp_listendrop (on tcp_listendrop)
> > probe:tcp_listendrop (on tcp_listendrop)
> > probe:tcp_listendrop (on tcp_listendrop)
> > probe:tcp_listendrop (on tcp_listendrop)
> > probe:tcp_listendrop (on tcp_listendrop)
> > probe:tcp_listendrop (on tcp_listendrop)
> > probe:tcp_listendrop (on tcp_listendrop)
> >
> > You can now use it in all perf tools, such as:
> >
> > perf record -e probe:tcp_listendrop -aR sleep 1
>
> The issue is that we can't extract the struct `sock *sk` from
> tcp_listendrop() using perf. Please correct me if I’m mistaken.
>
> In a containerized environment, it's common to have many pods running
> on a single host, each assigned a unique IP. Our goal is to filter for
> a specific local_ip:local_port combination while also capturing the
> corresponding remote IP. This task is straightforward with bpftrace or
> other BPF-based tools, but I’m unsure how to accomplish it using perf.
I suggest you ask tracing experts' opinions, why is perf able to add a
probe, but not bpftrace ?
Again, accepting thousands of patches just because of tracing tools
seems not good to me.
Powered by blists - more mailing lists